arashi has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Fellow Monks,

Problem Background
My office and I have recently completed a new HTML template for use on our websites. We're using a very "modular" approach to our pages, making as much as possible written by a centrally located script. The template is written so that several variables are defined, then the central script (currently in JavaScript) writes the layout of the page, leaving only content for users to define. We don't have a Perl server at this time, we're still playing the "Political Game" with our superiors, and we need to force their hand, so we've written the template so the page width is just a little bit too big to be printed out nicely. Our hope is to write a Perl script that will take in the HTML page, strip what isn't needed, and return that "printer friendly" version to the user. Once our superiors get tired of hearing complaints about pages not printing nicely, they'll demand a solution, which we'll be all to happy to provide, if we get our server. I know it sounds rather underhanded, and personally I don't like it myself, I hate playing the "Political Game" but they've left us with little choice, the requirements for development on a large website, IMHO, demand the use of a CGI server of some sort.

I've searched Perl Monks for information about "printer friendly" scripts, but all I found was questions about making nodes on the site printer friendly.

The Questions
Are there any modules that can accomplish converting a page to a printer friendly version?
Does anyone know of any good resources that can point us in the right direction?
Has anyone written scripts that do similar things, and would be willing to give us suggestions or tips?
We wrote our template with this application in mind, so in comment tags, we have put markers, indicating different sections that could be parsed out in Perl. We also hope to use this script to make out pages Web Accessible (for Blind Browsers, or those with vision problems), by parsing out the different sections and rewriting them in more accessible ways.

Thank you for your help,

Replies are listed 'Best First'.
Re: Printer Friendly Pages
by chromatic (Archbishop) on Oct 09, 2001 at 01:34 UTC
    I did a bit of work writing a Slash plugin to display printable pages, but it uses the Template Toolkit.

    If you're convinced that stripping tricky tags out of HTML is efficacious, either look into a module like HTML::Parser or tilly's Why I Like Functional Programming. The trick is to make a list of allowed tags, removable tags, and tags that should be dropped, leaving their content intact.

    If you have a good handle on what should and shouldn't be allowed, an afternoon or two should be sufficient.

    Does that help?

Re: Printer Friendly Pages
by Maclir (Curate) on Oct 09, 2001 at 02:12 UTC
    I have developed a "printer friendly" option on pages generated by scripts. In my case, I used EmbPerl on the server, but I guess the principle is the same.

    Place somewhere on the page a "printer friendly" link - and even put a printer icon next to it if you are super keen. The URL (or value in the "HREF" field) is the current URL, with a parm of "print=yes" appended to it. Now if you are already passing arguments, then you separate this with an ampersand, elsewise a ?.

    Then, the server side script that generates the page looks for the presence of the "print=yes" parm. Then, when the html is being generated, select a CSS style sheet that is a "bare bones" format - no background image, no fancy colors, all that stuff. You can also have sone logic within the page generation script, such that if print is on, don't include other widgets, (including the "printer friendly" link), all that stuff.

    What's that you say? Your aren't using style sheets at all? (deep sigh). Go to the back of the class. Now read all the material on the W3C page to understand why you should be.

Re: Printer Friendly Pages
by Stegalex (Chaplain) on Oct 09, 2001 at 04:24 UTC
    A couple of thoughts:

    1. You could control much of the appearance of your page through proper control of CSS style attributes (unless you are in one of those retro-shops that ban all forms of modernity).
    or (and this is not for the weak of stomach)
    2. You could convert your HTML to PDF on the fly with a neat utility called ps2pdf. What I am thinking is that you could run your HTML through ps2pdf which will give you a PDF output file that you could then stick in a servable directory and then redirect the user to that page. Beware, this is a BIG pain in the ass (I know because I use it to provide customers with nicely formated packing slips).

    Maybe you should focus on the political battle?
      SORRY!! I made a critical omission in the above advice.

      .What I do is to run the HTML through a utility called "html2ps" and then I run the resulting output through "ps2pdf".

      Again, I must stress that this is far from fun or ideal. For one thing, anything but simple HTML causes problems. For another, don't even think about pushing anything with DHTML through. You will blow chunks. Also, I seem to remember that getting images through this process was not much fun (contrary to the docs).
Re: Printer Friendly Pages
by ajt (Prior) on Oct 09, 2001 at 13:50 UTC
    I agree with Stegalex, unless your stranded in the dark ages of pages design, then this is a CSS solution.

    As much as I like Perl, if you are using Perl to repurpose your data for a printer, then your underlying design is broken. Though Perl can easily do this and there are plenty of suggestions here, you shouldn't have to do this in the first place.

    Have a look at the EXCELLENT A List Apart web site, where they show you how to build a very complex visual display simply, and that prints without any complex reparsing.

    Properly written XHTML with modern CSS, provides aural cues too. So unless you have legacy HTML, then I would not use Perl to repurpose content.

Re: Printer Friendly Pages
by stefan k (Curate) on Oct 09, 2001 at 12:36 UTC
    just two remarks:
    1. The (german) Heise Newsticker does just this. When I look at the page source of a news message I find a tag which isn't part of HTML (<HEISETEXT>) and the content of such a tag (which will probably be ignored completely by browsers) should be available with the HTML::Parser (or HTML::TokeParser?) module(s).
    2. I don't think that you will be able to introduce perl that way. First of all, you could do the same taks in other languages, too. Secondly probably no one will complain about pages not printing nicely. At least unless you're providing a content which usually gets printed.

    Regards... Stefan
    you begin bashing the string with a +42 regexp of confusion