This is just a quick introduction to batch loading content into content server with the Batch Loader application. The application can be run from the command line as well as in graphical mode. I normally run it in graphical mode.
I also like to put the check mark in the box to produce an errors file. This file is kind of neat, if anything fails it is placed in that file where you can tweak the metadata and then point batch loader at that file to attempt loading all the content that failed to load. Why might things fail? Perhaps you forgot one of the required fields like dSecurityGroup or perhaps you performed a check-in using a particular metadata value that did not exist in a validated list.
Usually you will need to add a setting to the intradoc.cfg file called Batch Loader User Name. This setting in your configuration might look like this:
BatchLoaderUserName=sysadmin
I am sure you could come up with better values to put in this field besides “sysadmin”, but for the sake of this demonstration/test this should get you running pretty easily.
NOTE: intradoc.cfg, not config.cfg.
Now you must construct the batch load file. Nothing scary here, just a text file. This file contains key=value pairs one to a line of metadata and values and uses hash/pound characters (#) to begin lines that are comments, like this:
# This is a comment
Action=insert
dDocType=ADCCT
dDocTitle=Product Details
dDocAuthor=sysadmin
dSecurityGroup=Public
primaryFile=<path to file>
dInDate=7/23/2008
dDocName=TestContentID
<<EOD>>
What if you wanted to perform a metadata only check in? Try something like this:
# This is a comment
Action=insert
dDocType=ADCCT
dDocTitle=Product Details
dDocAuthor=sysadmin
dSecurityGroup=Public
#primaryFile=<path to file>
dInDate=7/23/2008
dDocName=TestContentID
createPrimaryMetaFile=true
<<EOD>>
Sometimes we encounter the need to perform a search in content server that requires a lot of work with the <OR> query operator. Between that and a few other variables it might be possible to craft a query that is too long for content server. This happens especially with generated queries. If you are using universal query syntax or a verity engine for your search engine you might also encounter this. If the data you are searching on is metadata based you might be better off temporarily switching over to a database based search on the fly to find your results. By adding this setting to the query string you can change how content server interprets your search:
SearchQueryFormat=DATABASE
In most installations the Search Query Format setting will default to UNIVESAL. Now that you have switched over to a database based search you can set your query string using the IN operator for a much shorter query converting from something like this:
QueryText=xType <MATCHES> `A` <OR> xType <MATCHES> `B` <OR> xType <MATCHES> `C` <OR> xType <MATCHES> `D`
Into:
QueryText=xType IN (‘A’, ‘B’, ‘C’, ‘D’)
If you’re on a Database Full Text setup to start with you can still do this type of thing AND use full text like this:
QueryText=( xType IN (‘A’, ‘B’, ‘C’, ‘D’)) and (<ftx>keyword</ftx>)
Perhaps the best thing you can do is at least take a moment to evaluate why you must have so much “or” based logic in your query in the first place. Or is almost always one of the most costly instructions.
Alex Suhre just posted to the Oracle ECM Forum a link to a JavaScript based HDA data parser that allows you to paste HDA formatted data into a box and get a more human readable view of that data. It looks nice, and I have tried a few pieces of data in there and it worked well. I have not put it through the paces to see how it handles odd data or characters. It looks like you can download it as well, although you have to supply some information to be able to do so.
Real world applications? Well...not sure just yet, but it is neat, and it is always nice when a tool is turned over for community use.
Over on David Roe’s Blog "Content on Content Management" he has recently posted a pretty good article on the newly added Content Server JCR Repository Adapter. I really liked his post because he presents this with a curious but wary attitude and then goes on to present some of the history of the process and the political landscape this "standard" traversed to get to where it is today.
Why was his general attitude important? Well, he did not just jump right up and say "Hey, look what got published, go use it now because it is simply the best." He begins to present pros and cons about his personal feelings on JSR/JCR process and how vendors can use this as part of their sales pitch and why that is almost comical. Finally, because he presents it with a bit of caution I am a lot more apt to take it seriously.
Why is the history important? Sometimes history is not all that important but sometimes that political/historical take can give you insight into standards that were driven by an agenda that may not be in the best interests of the community as a whole. As an example, look at Microsoft and their OOXML standard. If someone told you OOXML was a standard and you blindly followed it you may wish you had known the back story of that all along, especially if the standards ratification ends up getting overturned later.
To wrap up: great presentation, great insight. Oh, and the topic was interesting too!
If you are trying to squeeze additional performance out of your production servers you might look into adding the setting DisableSharedCacheChecking. This would be placed in your config.cfg file or under general configuration through the admin server. As always, it may help to perform some kind of analysis or metrics gathering prior to adding the configuration and then re-analyze after configuration implementation. What this boils down to is content server watching dynamic html resource includes in components to see if they need to be reloaded.
DisableSharedCacheChecking=true
In other words: if in place on a production server (where you would not be changing includes like you would a development instance) this setting can alleviate a measureable amount of file system activity. This should translate into a performance gain in some fashion. Or to say this as confusingly as possible: a reduction in performance degradation…I just thought that would be fun to say.
Potential flamebait warning: And as we all know, disks are slooowww. Maybe I just hook the disks up wrong? I hope I hooked the disks up...