http://qs321.pair.com?node_id=140643

ajt has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a mega XSLT batch processing job to perform. Somewehere in the bowels of the building sits an AIX box with Informix/SAP on it. Stored in there are ~1600 product descriptions. The SAP Business Connector web interface can spit the product descriptions out as a single nested XML file.

On a Linux or NT box, a simple Perl script uses the nesting, and some simple rules, parses ths XML file using XML::Parser in stream mode and generates a nested set of directorites, and various descriptive text and XML files over the newly created directory tree.

I now traverse the director tree and find all the text files, and create appropiate HTML pages from them. That bit is easy, the problem I face is running XSLT on ~1600 xml files to get HTML.

I did a simple bench mark on NT: Instant Saxon 6.5; Xalan 1.3 (c++) and XML::LibXSLT, and found that the Java start up on Saxon makes it massivly slower to run than Xalan or the LibSXLT solution - assuming that the XSLT job is small and simple. Xalan is twice as fast as LibXSLT when both are called via a system call, but when in-line, LibXSLT is much faster.

Given that I have ~1600 XML files to transofrm to HTML, I can do this one of two ways, build a list and pass them one at a time to Xalan (it was faster than either Saxon or LibXSLT), or use LibXSLT from within the scipt that finds them (which should be the fastest method given simple transformations).

I'm not worried about raw speed, it will run in batch mode, but I would like it to finish in hours rather than days.

I'm also would not like a pure Perl solution to fail from a leak of some sorts, ~1600 XSLT calls in one process is a lot, and I'd rather not have to do it several times.

In summary I plan to:

I'll do this serveral times per language, and assuming all works, probably once per week as the product database underneath changes.

I know this is very brute-force, are there better approaches to the problem? give than I'm not allowed to use DBI to get data directly from the underlying database.

Hints, tips and suggestions, warmly accepted,

As every, my humble thanks in advance...