http://qs321.pair.com?node_id=734941

Rodster001 has asked for the wisdom of the Perl Monks concerning the following question:

Greetings!

I have a project I am about to begin but I thought I would post here to get some feedback on my approach first.

I am working for a company that works with a lot (thousands) of PDFs. Managing all of those is obviously a nightmare. So, while I am unable to get them away from using PDFs, I have convinced them to insert the "pieces" of the document into a DB, then use a script to merge the data with a template and generate the PDF on the fly.

I have decided that I would have them upload the content via a HTML form and insert into the database. Then, upon HTTP request, merge the data with an HTML template and use a HTML to PDF converter (I noticed there is quite a few out there) to produce the PDF for download/viewing.

What are your opinions on this approach? Can you recommend a HTML to PDF converter that you have had success with?

Thank you!

Replies are listed 'Best First'.
Re: Creating PDFs
by mr_mischief (Monsignor) on Jan 08, 2009 at 18:11 UTC
    I did a site for a client several months ago who needed PDFs created. The easiest path I found was to create the PDFs directly using PDF::API2::Simple with a few calls to the underlying PDF::API2 object it uses and exposes

    Generating a PDF can take quite some time in a web-serving time scale. Rather than doing so on every request, I set up a simple caching system. It serves the PDF as a file from disk.

    On every PDF request, it checks the age of the PDF file requested against the age of the latest DB entry that would be relevant to the particular PDF's content. If the file needs to be updated, it updates the file then serves it. That web user sees a delay for the file being generated. Additional users who download the file before any changes are made to it get the file already generated.

    One could do the generation at the time of the change so there's never a delay for the download. In this client's case changes to the DB are much more common than the finished document being downloaded. I decided to put the delay where it would be least frequent rather than where it would in every case be a burden to the maintainer and never on the receiver, but your needs may be different.

    You might even have the people updating the database choose when to generate a new version of the final PDF document. This could be really nice if the changes they make are not always self-contained. A "Generate New PDF" button could be a nice feature. It could also be a slight and unneeded burden that people would need to remember to do if that's not the usage model that best suits them.

    Another thing to consider when you're writing a document management system is how many versions of each document you need to keep. Will it be just the newest of each? Do you need the newest and the most recent older version (whether for comparison or in case a change needs to be backed out)? Do you need to have a full revision history so you can see which version of the document was in effect at any particular date? You may want a system that pulls from the DB, generates a PDF, and checks the PDF into a full-blown revision control system like CVS or Subversion. You may find it's better to keep the older data in the DB instead and have an administrative function to pull PDFs according to the old data on the fly.

    Is it necessary in this case to serve the PDF files from the web at all? It was in my case, because it was commissioned specifically as a web application. If your users have a common file share, though, you might want to consider having only the DB and file generation actually on the web. The files themselves could be generated and saved to the share for everyone to pull in using native OS functions.

Re: Creating PDFs
by LanX (Saint) on Jan 08, 2009 at 17:03 UTC
    Can't recommend this FORM->DATA->HTML->PDF chain! Better use something like PDF::Reuse with a templating system to generate PDF directly from the DATA. The templatingsystems don't force you to only generate HTML!

    UPDATE: In the case of PDF::Reuse you can even skip the templatingsystem, but if you insist to do so, you'll generate perl-code out of templates which have to be executed afterwards.

    Cheers Rolf

Re: Creating PDFs
by zentara (Archbishop) on Jan 08, 2009 at 16:51 UTC
    I've never tried it, but I've see a script at html2ps, after which you could run ps2pdf on it thru system.

    After messing around with ps and pdf conversions a bit, I think you may find inconsistent output from the different methods. So you might want to try many methods, and see which one gives the best fidelity in conversion.


    I'm not really a human, but I play one on earth Remember How Lucky You Are
Re: Creating PDFs
by dragonchild (Archbishop) on Jan 09, 2009 at 18:39 UTC
    Look at Apache's FOP. Also, Adobe has some tools to work with this. This isn't a good fit for Perl other than as the glue language.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?