Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

PDF generation

by michellem (Friar)
on Sep 03, 2003 at 16:21 UTC ( [id://288666]=perlquestion: print w/replies, xml ) Need Help??

michellem has asked for the wisdom of the Perl Monks concerning the following question:

Hi Folks,

I'm working on a project which requires nice formatted reports generated from database output. One possible way to go is to generate PDF files, and I've done a fair bit of research and beginning code testing to figure out the best way to go. I need a bit of advice on one particular thing.

One module that I like the best (or at least I like the idea best) is called PDF::Template. It uses an XML template to separate the formatting from the data. I like this approach a lot. The one fatal flaw, however (for me, not necessarily for others) is that it requires the totally not free software PDFLib. (Apparently, you used to be able to download it for free, but that is no longer the case, the free version puts a demo watermark on all PDFs generated. In any event, I'm a totally open source gal, so any software I use in my applications has to be open source.) However, it looks fairly straightforward, and I was thinking of writing a version that used PDF::API2, which looks like a really nice PDF creation module.

So my question is: is it worth doing this? Has anyone done something similar that I couldn't find? Based on my research, I'm pretty clear that PDF generation via PDF::API2 is the way I want to go (as opposed to using the older PDF::Create, or a combination of html2ps, etc.), but I'm wondering whether it's worth going whole hog and recreating this module (or a similar idea) or just writing my own little thing.

Thanks, and any suggestions/ideas/flogging is welcome.

Replies are listed 'Best First'.
Re: PDF generation
by ant9000 (Monk) on Sep 03, 2003 at 16:42 UTC
    I have had the same problem some time ago, and decided to take a slightly different approach: I generate reports in HTML, which is a trivial task, and then I use htmldoc to convert them to PDF on the fly.
    It supports full HTML 3.2 input (no CSS, though), it's extremely fast, and the PDF output is rendered better than under my old Netscape browsers :-)
    The source code is distributed under GPL2.
Re: PDF generation
by dragonchild (Archbishop) on Sep 03, 2003 at 17:26 UTC
    As the (hidden) maintainer of PDF::Template, I feel it necessary to say that we are working towards using free solutions. We're also looking at addressing things that PDFLib doesn't do (such as Acrobat forms). Unfortunately, I have something like -4 hours /week to work on it. *winces* If you want to help, please email me at and I can definitely put you to work.

    We are the carpenters and bricklayers of the Information Age.

    The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: PDF generation
by freddo411 (Chaplain) on Sep 03, 2003 at 16:55 UTC

    ++ to your question. I had almost exactly the same issue to solve last year, and after researching the options I came to the same conclusions. I went back to my customers and asked them to purchase PDFlib in order to proceed with the project. They declined, so the project died an early death

    First off, why PDF? Because only PDF can be reliably displayed AND PRINTED on multiple OS, browsers and hardware. It would be nice if there was another way...

    The ideas and architecture behind PDFlib and PDF::Template are key; that is that you can create a template PDF that can be filled in at run time by the perl program. PDFlib software allowed a number of important services.

    * A template editor plug-in for Adobe Acrobat. This allowed templates to be created that overlayed existing PDFs. This was important because it allowed any program to make the layout of the form then "print" the form to pdf from which the template could then be constructed. The template editor was an easy enough GUI to be used by anyone: Anyone could maintain/make the templates.

    * A C-based perl Module that allowed templates to be populated by the perl program. The API was very nicely designed, very simple and robust.

    Is it worth it to roll your own? I would anwser yes! It would be great to have similiar or identical functionality so that one could output to a template that would result in an output file that was identically printable across any platform. I would contribute time to such an open source project.


    Fred Kleindenst

    Nothing is too wonderful to be true
    -- Michael Faraday

Re: PDF generation
by tcf22 (Priest) on Sep 03, 2003 at 16:46 UTC
Re: PDF generation
by adrianh (Chancellor) on Sep 03, 2003 at 16:48 UTC

    Sounds like a great idea to me - if a fairly large amount of work :-)

    As another option I've used the (GPL licenced) htmldoc myself. With a little work you can get very nice results.

Re: PDF generation
by Anonymous Monk on Sep 03, 2003 at 17:57 UTC
    Here is a different approach, instead of creating a PDF report directly, you can create your report in a XML format first, and then use a XSL stylsheet and Apache FOP to translate the XML file to a PDF file. The advantage is, by this way, you can easily provide your report in a different format (e.g. in HTML format), when it's needed, by just supplying a different stylesheet.
      XML + XSL <=> fully visually formated, paginated file.

      Very roughly:
      * XML = data
      * XSL = fonts/colors

      What about:
      * pagination
      * graphical elements
      * background images

      What is proposed with PDF results in a fully paginated, layouted file with embedded graphics all nicely wrapped up into a single template file.

      As mentioned below, FOP uses XML tags to more or less completely specify layout, pagination, etc, etc. Thus you could have XML encoded data and using an XSLT tranformation end up with a XSL-fo file that would be nicely layed-out. These file are specifically designed to be then rendered into PDF for presentation.

      I'm not aware how mature the tools are for generating an XSL-fo file (especially GUI driven visual tools). I'm still in favor of PDF Templates because at this point PDF print drivers allow virtually any program to output PDF templates.

      Nothing is too wonderful to be true
      -- Michael Faraday

Re: PDF generation
by Willard B. Trophy (Hermit) on Sep 04, 2003 at 12:15 UTC
    You should be able to substitute PDFlib Lite for PDFLib. The summary of its license agreement is here; you can redistribute it, you can modify it, but with certain caveats. You might be able to live with it; I have been, so far.

    PDF::API2 has a wonderful feature set, and is being developed at breakneck pace, but I find its documentation a bit bewildering. It has a decent user community, though, through which you will find many answers.

    bowling trophy thieves, die!

      I'm one of those PDF::API2 users also. I looked high and low for a good way to get my feet wet in PDF generation, and ended up there. It's served my changing purposes quite well, and while I've met with the occasional issue here or there, and the same confuddling documentation, the author is updating it probably weekly, and the message boards are active enough.

      Message Board
Re: PDF generation
by nite_man (Deacon) on Sep 04, 2003 at 06:25 UTC

    I use HTMLDOC. It's good solution for simple converting HTML to PDF. Also, I'd like suggest to use module HTML::HTMLDoc, which implements an object-oriented Perl interface to this programm.

    But for build a commercial system which should generate some documents in different formats, I'd suggest to look at FOP. I've just looked over information about this tool and I hope that I will use it soon in my project for development system of documents circulation.

    _ _ _ _ _ _
      M i c h a e l

Re: PDF generation
by JamesNC (Chaplain) on Sep 05, 2003 at 02:29 UTC
    I have written a module that produces PDF reports using just Perl only (no libs). About 4 weeks ago, I was tasked with automating a research report that pulled information from Sybase and MSSQL databases. A requirement was that the report had to look the same as the existing report which was being produced by cutting and pasting data in Excel and MSWord. The data had to be scrubbed by hand, and the customer wanted rules automated as well.

    None of the Perl PDF modules did everything I needed, and I didn't want to hack the existing modules, so, I sat down and read the first 500 pages of the Adobe PDF Reference Manual and wrote my own special implementation of the module (I knew PostScript already, so it wasn't so scary). I took a different approach than the existing modules and decided to developed a pdf object with methods for each of the things I needed to do. So I have built a graph method(it draws scaleable stacked bar charts with a legend - font,size,color,linewidth, autoscale, etc), a table method(header,footer, rows, cols, font, fontsize, color, padx,pady, linewidth, linepattern, alignment), an image method ( file, height, width, x, y (only does jpgs currently)), text method(x,y,align,font, size, color), and methods to draw ...

    I have about 75% of module finished (only need to add the methods to produce an index automatically from the content, and a rule set to split the content stream across pages. (This is one area that lead me to writing my own module in the first place... PDF doesn't have any mechanisms ( or restrictions!) as to how to wrap or when to do a page break. As a result, you have to keep track of the content height yourself or it just runs off the page.

    Three weeks ago, my customer decided that he would rather have the project completed in Java because they were worried about the long term maintenance of the report ... (sigh) and so, I had to stop working on the Perl implementation and start developing the report using iText.

    I am going to complete this module in my spare time on the weekends just to show the customer just how powerful Perl is.

    I plan on implementing a simple write method that mimics Perl's format so that users didn't have to learn anything new to send their output directly to PDF's.

    I have run into a few technical difficulties concerning encoding fonts, doing hyphenation, auto-pagination, multi-column control and layout and a few other items... ie... I need some help!) Anyway, if you would like to know more see a sample report, or if you want to help me, send me an e-mail:

    James NC
(wil) Re: PDF generation
by wil (Priest) on Sep 04, 2003 at 12:38 UTC
    open source != free

    I can't stress that enough. If you're an "open source gal" then go read the website. The source is developed on an 'open source' model. They just choose to charge for the end product.

    Sounds to me like you're just after a free (as in beer) solution, not neccessarily an open source one at that. Please make the distinction.

    - wil

      Tosh. --.

      The PDFlib Lite licence and the PDFlib licence are not open source licences.

      From the latter's licence:

      It is expressly agreed that this license does not include ownership of the programís source code, but only the right-to-use as defined by this agreement.


      Licensee may not resell, transfer, rent or lease the program. Licensee is not allowed to transfer the rights obtained under this license to any third party.

      Neither licence is free (as in freedom). Please get your facts straight before you criticise

      I agree, open source does not necessarily mean free, and I was oversimplifying in talking about what I meant by free. I am interested *both* in free as in beer as well as freely modifyable, redistributable, etc. - something along the lines of a GPL/BSD license.

      I don't know what you are talking about when you say "read the website". There is not a whole lot on the website suggesting that the software is written in an open source model. The license can't possibly pass muster as an open source license. Agreed, PDFlib lite seems to be released as open source, and does fit open source guidelines - but that doesn't mean that PDFlib in general is developed with an open source model.

Re: PDF generation
by mandog (Curate) on Sep 10, 2003 at 03:13 UTC
Re: PDF generation
by Anonymous Monk on Sep 05, 2003 at 21:34 UTC
    I definitely vote for the PDF generation through another means option. I've done autogeneration of reports based on data, some pulled from flat files, some pulled from database backends, and it always seems easy to go through some language that affords a certain level of redundant markup. As such, the XML, HTML options are quite helpful, for markup, but formatting is a different issue.

    Thus, I have to say I quite like using TeX for these types of things, because it gives you markup when you want it, it's language based so it's easy to generate from within scripts, and it will give consistent formatting if that is necessary (generating froms from DBs, etc.). It might take a little bit more work, since you need to be familiar with TeX, and it might take a few more CPU cycles, but it pays off by offering quite a bit more control over the layout.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://288666]
Approved by Popcorn Dave
Front-paged by TStanley
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2024-04-20 13:15 GMT
Find Nodes?
    Voting Booth?

    No recent polls found