Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

PDF::Template redesign - I want your ideas!

by dragonchild (Archbishop)
on Dec 01, 2005 at 19:46 UTC ( #513404=perlmeditation: print w/replies, xml ) Need Help??

PDF::Template (and, eventually, Excel::Template) will be undergoing a complete redesign over the next 6-12 months and will be rebuilt from scratch. There are a number of reasons for this, which I'll outline in a little bit. The reason I'm posting here is that I want to solicit ideas/suggestions/feedback from the community.

Reasons for the redesign:

  • The codebase is completely bloated and impossible to follow. Changes are very hard to make, especially as a part-time effort. There are several bugs that have been there for years.
  • The code doesn't scale - after about 20 pages, the PDF starts spinning its wheels horribly.
  • The template language sucks - XML was chosen by the original author because he wanted to play with XML, not because it was the best idea.
  • There is no testsuite and retrofitting one will take at least as long as a redesign, if not longer. And, there's no guarantee of success.
  • autrijus's initial development of PDF::Writer makes this possible.
  • I can't add tables to the current codebase.

I want to find out, from the community, how they want to use a templating system for PDF generation. Here's the thing - templating PDFs is not the same as templating HTML or xSV. Instead of just merging variables with a layout, you are also creating the layout. With HTML::Template or Template Toolkit, you're using HTML as the layout language and you're just plugging stuff in. With PDFs (and Excel), you also have to specify the fonts, colors, and layout.

Some ideas I've had:

  • Keep the XML file structure, but do it smarter. I have no idea what smarter would be, but this is the only backwards-compatible solution.
  • Somehow, do a TT plugin that provides a bunch of helper functions. Doing this, your template would consist solely of TT directives, some of which would be provide by TT and the rest provided by this plugin. I like this idea, but don't have much of a plan to accomplish it.
  • Do something else. I have no idea what "else" would be.

Whatever route is taken, I will probably rewrite Excel::Template to the same codebase, to keep them in sync. (E::T's codebase is slightly better, because I could write tests more easily, but not by much.)

Please don't vote on this node without responding. The last meditation I wrote about PDF::Template is at a reputation of 41 (and climbing) with no responses. While I appreciate the upvotes, I need the responses more. I'd prefer this node to be at a reputation of 0 with 20 responses than at 20 with no responses.

Update: A number of responses have said "Why can't you just PDFify the HTML?" I should have included this in the initial post, but here goes:

The problem with this is that while the datasource remains the same, the Excel, PDF, and HTML portions have very different look-and-feel requirements. For example, a set of images might be required in the PDF that's different from the HTML and that cannot be displayed in the Excel. The difference might be something as simple as a reversed image (instead of white on black, it's black on white). Not to mention header pages, links, bookmarks, table of contents . . . the list goes on and on.

The second issue is headers and footers. HTML and Excel don't have them, but PDF does.

And, that brings up the general issue of pagination. HTML doesn't paginate the same way PDF does, and Excel is different yet again. (I won't even start with RTF.) You'll end up with a really icky-looking PDF if you attempt to merely PDF-ize an HTML document. And that doesn't even start with the problems Excel has with that.

Bottom line - there needs to be separate PDF and XLS templates. What I'm trying to do is figure out how to make it easier to transfer the knowlege a programmer has when creating an HTML layout into a PDF (and possibly an XLS).


My criteria for good software:
  1. Does it work?
  2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
  • Comment on PDF::Template redesign - I want your ideas!

Replies are listed 'Best First'.
Re: PDF::Template redesign - I want your ideas!
by eric256 (Parson) on Dec 01, 2005 at 20:22 UTC

    No strictily related to the actualy code redesign: I just installed PDF::Template last night, and I have NO clue how to even start because there don't seem to be any examples at all. So for me, Examples of whatever you do choose as the layout. I like the XML layout, but it should be XML, not psuedo XML. psuedo XML does no one any good (i just mention this because the docs for PDF::Template say it isn't realy XML (not sure how true that is anymore).

    I wonder if HTML could be used as the layout, is there some free HTML->Layout Tree converter that could be used? Just rambling in case it makes sense. Being able to design once in HTML and be done would be very pretty ;)

    Err looking at the source shows an example now....Did you just add that?


    ___________
    Eric Hodges $_='y==QAe=e?y==QG@>@?iy==QVq?f?=a@iG?=QQ=Q?9'; s/(.)/ord($1)-50/eigs;tr/6123457/- \/|\\\_\n/;print;
      Lack of documentation is definitely a problem with PDF::Template. The tests that are there show some ideas, but I have to go back to the source to remember how to do stuff.

      As for the layout, it's real XML - I use XML::Parser to parse it. The "pseudo" part of it comes from the fact that a childnode has access to the attributes of all its ancestor nodes. This scoping was so that you could specify something in the pdftemplate node (the root), such as H, and have all the children use that as their H. (This feature is actually one part of the performance problem I was speaking of.)


      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: PDF::Template redesign - I want your ideas!
by samtregar (Abbot) on Dec 01, 2005 at 20:28 UTC
    I recommend you start from a series of use-cases and design your solution based on that. I have no idea what the use-cases are for PDF templating because I've never wanted to do it. Maybe there aren't any, in which case you should delete the module rather than revise it!

    When I designed HTML::Template I had a single use-case firmly in mind: an HTML designer and a Perl coder need to work together to produce an application and neither knows (or even wants to know) the other's language. This use-case may be entirely inappropriate for templating PDFs. Do "PDF designers" even exist?

    -sam

      Every experience I've had with PDFs goes something like this:
      • The web designer lays out the webpage. It has a bunch of header/sidebar stuff and then there's the tabular report in the content div.
      • The client likes it, but wants to have something they can print out.
      • Enter PDF::Template.
      • Through a series of iterations, a header page, headers and footers, and various layout options appear, as if by magic.
      • Eventually, the client gets bored of tweaking and asks you to promote to production.

      I've talked with people who've used PDF::Template to generate actual PDFs, but I don't know what process(es) they used, if any.

      You are right - the lack of use cases is contributing to the lack of direction. Maybe what I'm asking for here is "How do you want to use something that generates PDFs from a layout + parameters?" ...


      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
        You are right - the lack of use cases is contributing to the lack of direction. Maybe what I'm asking for here is "How do you want to use something that generates PDFs from a layout + parameters?" ...

        Maybe I do have an idea after all. What would make it the most easy for me (and many others) to use is to allow CSS + HTML to be a feed interface.

        I would be completely overjoyed to be able to just use the media="print" within an HTML page to drive PDFs. I realize this is a daunting suggestion that may not even be a good idea but it would be sheer genius if it could be done. I don't know the first thing about PDF internals but perhaps it might map fairly well...? Or maybe there is already an engine out there that could handle the interchange?

        This is the classic case, and I have and have seen many such needs. For text-based stuff as you describe, it's fairly straightforward. One of the groty issues we've run into with all available solutions, however, is that images (logos, etc.) from the web page get re-interpolated and this just makes them look like the dog barfed. Since there are very few pages any more without some graphics, this becomes a more pressing issue. I think converting is more important than manual formatting, but maybe I'm biased because we run into this all the time.

        As to XML versus CSS versus a little language, I think that if it "just works" for the primary use case, any extra that I/we have to learn in any format is not too much to ask.

        Don Wilde
        "There's more than one level to any answer."
Re: PDF::Template redesign - I want your ideas!
by holli (Abbot) on Dec 01, 2005 at 21:58 UTC
    Is there really a need for it? Imvho, for template driven pdf generation xsl-fo (and a converter to pdf for it) is the way to go. Apaches's FOP does a very good there. And it's easy to create xsl-fo using a templating engine or via xslt.

    What I would really like to see is a Perl/FOP glue module.


    holli, /regexed monk/
      That may be a good web solution, but PDF::Template is a stand-alone solution. Adding XSL-FO as something PDF::Writer outputs would definitely be a good idea. Wanna write it?

      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: PDF::Template redesign - I want your ideas!
by Madams (Pilgrim) on Dec 02, 2005 at 01:48 UTC
    Have you thought about taking a look at what the Sql-Ledger people are using? It seems to me that they are using perl to fill in and "compile" LaTEX templates to postscript and then passing the result off to Ghostscript to make the PDF.

    Of course some will say that LaTEX is nasty to use / learn, but it's really quite straight forward, stable, excessively documented, has many friendly/helpful users, and portable ( I think the only thing you can't find a binary for is EPOC/WinCE/PalmOS etc and even that may be untrue!).

    Ghostscript is much the same.

    The only real downside is needing to have access to both in addition to Perl, especially since LaTEX does come with "the kitchen sink" (you can get a set of CTAN (comprehensive TeX archive network) disks with everything for cheap.) and is a large install.

    I'm kind of thinking that maybe PDF::Template could be a front for such a system to, I don't know, maybe help abstract out the *icky* details? Who knows maybe the Sql-Ledger people (sic) would drool all over for something like that. I know I would, I looked at PDF::Template quite a while ago and decided that, as it stood at that time, reading perlop.c was better for my mental health ;).


    "All too often people confuse their being able to think with their actually having done so. A more pernicious mistake does not exist."

    --Moraven Tollo in Michael A. Stackpole's A Secret Atlas

      Oohh, Perl + LaTeX = Yum!

      Have you thought about taking a look at what the Sql-Ledger people are using? It seems to me that they are using perl to fill in and "compile" LaTEX templates to postscript and then passing the result off to Ghostscript to make the PDF.

      I was actually thinking about doing the very same (to write cover letters) and had almost started working on it, and then had forgotten it. Thanks for reminding me. Could you possibly provide more details and/or pointers?

Re: PDF::Template redesign - I want your ideas!
by stvn (Monsignor) on Dec 01, 2005 at 20:33 UTC
    Keep the XML file structure, but do it smarter. I have no idea what smarter would be, but this is the only backwards-compatible solution.

    Well, I assume by "smarter" you mean "more concise", which unfortunately is not that simple with XML (see HTML for an example of this problem ;). However, maybe if you were to follow the Ruby-on-Rails idea of "intelligent defaults" this might work. The problem of course is, what are those "intelligent defaults"? Taking this idea to an extreme, and you will end up with a PDFML of some kind, which might not be a bad thing.

    Somehow, do a TT plugin that provides a bunch of helper functions. Doing this, your template would consist solely of TT directives, some of which would be provide by TT and the rest provided by this plugin. I like this idea, but don't have much of a plan to accomplish it.

    I know nothing of TT plugins, but it sounds like what you really want to do is to re-use the TT parser and get some kind of AST (abstract syntax tree) which you can use to build your PDF from.

    This idea has it's merit, especially since you get a high quality parser that has been thoroughly battle tested already. However, I question if a PDF-mini-language embedded in TT would not end up being almost the same as writing it using Pure Perl as your template language. And if given a choice between TT or Perl as my mini-language, I might lean towards Perl since it as much more robust. But then again, restricting functionality is sometimes a good thing too.

    Overall, I am skeptical of this approach, I am not convienced it will buy you anything more than Pure Perl.

    Do something else. I have no idea what "else" would be.

    Well, you can write your own DSL (domain specific language) for PDF templating. Using something like Parse::YAPP or Parse::RecDescent is probably not all that much more difficult than making the XML "smarter" anyway. This would force others to learn your new language, but if you keep it "familiar", then it probably won't be any harder to learn than some esoteric XML dialect.

    Whatever you choose, I would recommend creating a kind of Object Model for PDFs, similar to the HTML DOM (but not as complex and ugly). If done properly, this would serve as the "runtime" for your PDF templating language, and would allow for multiple "front-ends" to be written (XML, DSL, TT, etc). It would also be easy to exchange various "back-ends" as well (pdflib, PDF::API2, etc).

    Anyway, thats my 2 cents (and an upvote).

    -stvn

      If you stick with XML as the base template language, then couldn't you realy build a TT plugin that helps output that XML in an easier way? Then if you want real power you can go to the XML, but if you want ease you can use TT.

      $more_options > $less_options unless time_to_develop($more_options) > life_of(%universe)

      ___________
      Eric Hodges $_='y==QAe=e?y==QG@>@?iy==QVq?f?=a@iG?=QQ=Q?9'; s/(.)/ord($1)-50/eigs;tr/6123457/- \/|\\\_\n/;print;

        One last thought from me. What about using some subset of DIVs from HTML so that HTML Editors could be used to generate PDF templates? I'm not sure I like that idea much, but using an existing layout scheme has advantages in the forms of existing editors, and disadvantages (like the fact that HTML sucks and we all know it ;) )


        ___________
        Eric Hodges $_='y==QAe=e?y==QG@>@?iy==QVq?f?=a@iG?=QQ=Q?9'; s/(.)/ord($1)-50/eigs;tr/6123457/- \/|\\\_\n/;print;

        I considered this, however, I am not sure you gain anything more than the complexity you are introducing. It is basically using TT as a macro language for XML (which it already is). But how useful this might be to the user, I am not sure. However, in the depths of the "compiler" I could see using TT to do things like "unrolling loops" or something like that.

        -stvn
Re: PDF::Template redesign - I want your ideas!
by philcrow (Priest) on Dec 01, 2005 at 20:28 UTC
    You said you wanted comments, so even though I don't generate much pdf, here goes.

    I don't like writing in XML. It's ok as a data interchange language in which case I use TT to fill in the variable bits. Coding in XML is painful (I've used ANT and XSLT). Please don't make me use it for coding.

    I like the idea of a TT plugin or several of them. There is always an advantage to a system people understand and like. But your third option might work too. Perhaps you can conceive of a little language which is easy to write and parse that is still flexible. I've recently been at work on a little language which describes web apps. It uses simple named blocks. Your situation is more complex, but maybe something like that could work.

    Phil

Re: PDF::Template redesign - I want your ideas!
by snoopy (Curate) on Dec 02, 2005 at 06:41 UTC
    PDF::APIx::Layout is newly released module for marking up text parapgraphs with PDF::API2. It can produce quite advanced text markup, including mixing fonts colors, with justification and reflow etc.

    It would be nice to be able to utilize this stuff from PDF::Template, either through direct embedding of these tags in whatever markup language that you come up with, or through your proposed plugin scheme!

    A simple 'hello world' example follows:

Re: PDF::Template redesign - I want your ideas!
by Your Mother (Archbishop) on Dec 01, 2005 at 22:50 UTC

    I'm sorry I'm not writing with any ideas but I just wanted to say: go, man, go! Excellent. I am planning on setting up an archival literature website in the next few months which allows documents to be downloaded as plain text, PDF, and maybe RTF. So you are probably saving me many hours of pain and helping to improve the website for the users. If it turns out I have anything to contribute (docs, tests, ?) when you have your 0.01 up, I'll gladly do so.

      FOP supports both pdf and rtf as ouput formats from the same template.


      holli, /regexed monk/

        Thanks++. Saw that some time ago and completely forgot about it.

Re: PDF::Template redesign - I want your ideas!
by eric256 (Parson) on Dec 02, 2005 at 16:28 UTC

    The more I look at the and think about it, the more it seems the template portion should be completly seperate from the PDF generation portion. Is there any real reason to combine those two together? Then HTML, PDF, EXCEL would all use the same template code and the post parse the output in the needed form. Code might look like:

    <code> use Simple::Template; #dunno what you would call it but it pulls the *template* portions out use PDF::Template; my $pdf = Simple::Template->new(filename => $filename, processor => PDF::Template->new() ); $pdf->output_file($filename . '.pdf'); <code>

    PDF::Template could even automate the calls to Simple::Template. A generic backend template would have the advantage that the user could use HTML::Template (style) or TT or something else to generate the final code to send to the Processor.


    ___________
    Eric Hodges $_='y==QAe=e?y==QG@>@?iy==QVq?f?=a@iG?=QQ=Q?9'; s/(.)/ord($1)-50/eigs;tr/6123457/- \/|\\\_\n/;print;
      The problem with this, and I should have posted this initially, is that while the datasource remains the same, the Excel, PDF, and HTML portions have very different look-and-feel requirements. For example, a set of images might be required in the PDF that's different from the HTML and that cannot be displayed in the Excel. The difference might be something as simple as a reversed image (instead of white on black, it's black on white).

      The second issue is headers and footers. HTML and Excel don't have them, but PDF does.

      And, that brings up the general issue of pagination. HTML doesn't paginate the same way PDF does, and Excel is different yet again. (I won't even start with RTF.) You'll end up with a really bad PDF if you attempt to PDF-ize an HTML document.


      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
        The second issue is headers and footers. HTML and Excel don't have them, but PDF does.

        As per my earlier post -- you can utilize the proposed W3 extensions for this. For example, see this fragment taken from CSS Print to put a running page count at the bottom of pages:

        <style> @page { counter-increment: pages; @bottom-center { font-family: Times, Palatino, serif; font-size: 12pt; font-weight: normal; content: "Page " counter(pages); } } </style>

        Sure, a lot of these standards are proposed/preliminary, but I think they offer a good place to start because a group of people have already been thinking about these sorts of challenges.

        -xdg

        Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

        PDF'ing an HTML page almost always ends up badly anyway. Each time I generate a PDF with the same data as used in an HTML page, I end up using TT (well close to every time). Using TT would definitely be my choice here.

        You missed my point.

        Instead of having the template portion (control constructs etc) as part of the module, all the modules should use the same template. Then each module expects different data. So control flow and variable insertion remains the same, yet formating would be independent. So a PDF template might have header and footer tags, while xls has row and colum stuff, regardless they all still use the same loop, if, var code from the basic template. Code reuse 101 ;)


        ___________
        Eric Hodges $_='y==QAe=e?y==QG@>@?iy==QVq?f?=a@iG?=QQ=Q?9'; s/(.)/ord($1)-50/eigs;tr/6123457/- \/|\\\_\n/;print;
        HTML and Excel don't have them, but PDF does.
        Regarding Excel there is not a Header/Footer per se, but you can define rows and columns that repeated on each printed page. That's are pretty much the same.


        holli, /regexed monk/
Re: PDF::Template redesign - I want your ideas!
by herveus (Parson) on Dec 02, 2005 at 18:54 UTC
    Howdy!

    Hokay...I have a potential use case for myself, but first, a comment:

    I don't want to have to write XML.

    I have a structured document that I produce. Presently, I use a C program I inherited with the job that produces custom PostScript. Unfortunately, the full document has some front and back matter (including a table of contents) that have to be generated by hand and merged together.

    The Preview tool in MacOS X will convert my PostScript into PDF and leave me with a PDF with searchable and selectable text. What I'd like to be able to do (and did some preliminary poking around toward) is to directly generate a PDF, ideally with a real ToC that links to the appropriate places.

    The current C program treats the body of the document as having a nested structure of bits as it sets the type. The bits of stuff are:

    • Page
      • Column
        • Section
          • Row
            • Cell
              • Paragraph
                • Line
                  • Word
    I'm doing this off the top of my head, so I may have left out a level of detail here. Words are the smallest unit of stuff to set, being nominally indivisible strings.

    The input data gets grouped into Section chunks. When a Section is fully populated with the text and its formatting, the Section is poured into the Column, leaving a rump Section when it doesn't all fit. Keep pouring into Columns as necessary. When a Page fills up, the PostScript gets generated and sent to the output filehandle. Sections that break across a Column have a continuation header on subsequent columns, and Pages have headers in the manner of dictionaries.

    Typeface stuff is applied at the Word level. Each level includes positioning data that is relative to its container, and the higher-level elements have margins and sizes (being nested rectangles).

    I have a multi-tiered template in mind -- and I suspect that it could be expressed in CSS terms as well, just to confuse matters. At one point, I had worked up an XML-ish representation of the document but didn't go far with it before my attention span ran out.

    yours,
    Michael
Re: PDF::Template redesign - I want your ideas!
by radiantmatrix (Parson) on Dec 02, 2005 at 21:01 UTC

    Hey, as long as you're addressing PDF templating with a complete rewrite, I have a humble request (that is, one I'm not nearly good enough to implement myself if I *did* have the time). I'd like to see a PostScript::Template module, and then have the PDF::Template module implement the same interface for similar capabilities. Obviously, the PDF module would also have more features and options, because PDF can do a bit more than PS.

    This would make it really easy to write code that generates PS files for whatever reason (say as targets to a PostScript printer), and trivially cause that code to write a PDF instead. It would be as simple as, say:

    my $writer; if ($cgi->param('mode') eq 'pdf') { $writer = PDF::Template->new(@params); } elsif ($cgi->param('mode') eq 'ps') { $writer = PS::Template->new(@params); } else { ht_error("I don't know about mode '".$cgi->param('mode')."'"); }

    I realize that's quite a bit more work. All I can offer is that I would use it almost immediately, and be a good source of test-driven bug reports.

    tilly and dragonchild both make excellent points below. I would still very much like a way to specify whether I'm dealing with PS or PDF inside a templating system. dragonchild's solution re. placement is probably for the best. tilly's solution works well for *NIX applications, but (a)I'm cautious about depending on external apps, and (b)there's never a guarantee that thing work the same on Win32. All that said, this node's specific request is withdrawn.

    <-radiant.matrix->
    A collection of thoughts and links from the minds of geeks
    The Code that can be seen is not the true Code
    "In any sufficiently large group of people, most are idiots" - Kaa's Law
      The ps2pdf utility works well for converting postscript to pdf.

        Yes, it does. The particular thing I had in mind was a web app that currently supplies on-the-fly generated PDF files to its users. I have had a number of requests to give users the option of PS format (for various reasons which don't matter here).

        pdf2ps doesn't, AFAIK, exist... and even if it did, I'm not sure I want to deal with the disk caching such a thing would require (I hate temporary files, and I don't use them unless I must).

        So, despite the value of your comment, I'd still like a similar interface for creating PS or PDF files.

        <-radiant.matrix->
        A collection of thoughts and links from the minds of geeks
        The Code that can be seen is not the true Code
        "In any sufficiently large group of people, most are idiots" - Kaa's Law
      This kind of capability actually belongs in PDF::Writer, not PDF::Template. P::W is the module that P::T uses to abstract away the rendering engine. It currently provides an API over PDF::API2 and PDFlib, but there's no reason it couldn't do that for any paginated format, such as PostScript.

      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: PDF::Template redesign - I want your ideas!
by bsb (Priest) on Dec 02, 2005 at 11:08 UTC
    With PDFs (and Excel), you also have to specify the fonts, colors, and layout.
    I don't know if it's applicable, but could you borrow from the CSS model?
Re: PDF::Template redesign - I want your ideas!
by VSarkiss (Monsignor) on Dec 02, 2005 at 15:44 UTC
Re: PDF::Template redesign - I want your ideas!
by xdg (Monsignor) on Jan 17, 2006 at 16:28 UTC

    I stumbled across an article related to this on A List Apart: Printing a Book with CSS: Boom!.

    -xdg

    Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://513404]
Approved by ww
Front-paged by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (2)
As of 2021-09-27 01:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?