This is just a quick introduction to batch loading content into content server with the Batch Loader application. The application can be run from the command line as well as in graphical mode. I normally run it in graphical mode.
I also like to put the check mark in the box to produce an errors file. This file is kind of neat, if anything fails it is placed in that file where you can tweak the metadata and then point batch loader at that file to attempt loading all the content that failed to load. Why might things fail? Perhaps you forgot one of the required fields like dSecurityGroup or perhaps you performed a check-in using a particular metadata value that did not exist in a validated list.
Usually you will need to add a setting to the intradoc.cfg file called Batch Loader User Name. This setting in your configuration might look like this:
BatchLoaderUserName=sysadmin
I am sure you could come up with better values to put in this field besides “sysadmin”, but for the sake of this demonstration/test this should get you running pretty easily.
NOTE: intradoc.cfg, not config.cfg.
Now you must construct the batch load file. Nothing scary here, just a text file. This file contains key=value pairs one to a line of metadata and values and uses hash/pound characters (#) to begin lines that are comments, like this:
# This is a comment
Action=insert
dDocType=ADCCT
dDocTitle=Product Details
dDocAuthor=sysadmin
dSecurityGroup=Public
primaryFile=<path to file>
dInDate=7/23/2008
dDocName=TestContentID
<<EOD>>
What if you wanted to perform a metadata only check in? Try something like this:
# This is a comment
Action=insert
dDocType=ADCCT
dDocTitle=Product Details
dDocAuthor=sysadmin
dSecurityGroup=Public
#primaryFile=<path to file>
dInDate=7/23/2008
dDocName=TestContentID
createPrimaryMetaFile=true
<<EOD>>
Sometimes we encounter the need to perform a search in content server that requires a lot of work with the <OR> query operator. Between that and a few other variables it might be possible to craft a query that is too long for content server. This happens especially with generated queries. If you are using universal query syntax or a verity engine for your search engine you might also encounter this. If the data you are searching on is metadata based you might be better off temporarily switching over to a database based search on the fly to find your results. By adding this setting to the query string you can change how content server interprets your search:
SearchQueryFormat=DATABASE
In most installations the Search Query Format setting will default to UNIVESAL. Now that you have switched over to a database based search you can set your query string using the IN operator for a much shorter query converting from something like this:
QueryText=xType <MATCHES> `A` <OR> xType <MATCHES> `B` <OR> xType <MATCHES> `C` <OR> xType <MATCHES> `D`
Into:
QueryText=xType IN (‘A’, ‘B’, ‘C’, ‘D’)
If you’re on a Database Full Text setup to start with you can still do this type of thing AND use full text like this:
QueryText=( xType IN (‘A’, ‘B’, ‘C’, ‘D’)) and (<ftx>keyword</ftx>)
Perhaps the best thing you can do is at least take a moment to evaluate why you must have so much “or” based logic in your query in the first place. Or is almost always one of the most costly instructions.
Alex Suhre just posted to the Oracle ECM Forum a link to a JavaScript based HDA data parser that allows you to paste HDA formatted data into a box and get a more human readable view of that data. It looks nice, and I have tried a few pieces of data in there and it worked well. I have not put it through the paces to see how it handles odd data or characters. It looks like you can download it as well, although you have to supply some information to be able to do so.
Real world applications? Well...not sure just yet, but it is neat, and it is always nice when a tool is turned over for community use.