http://qs321.pair.com?node_id=51012

Introduction

Yes, I know the title is bad code. I wanted something memorable so we can point back to this node again and again.

We've seen it again and again. Everybody and their dog at one time or another seems to have toyed with an alternative to CGI.pm. If you think it's too bloated, try CGI::Lite, but don't go rolling your own. This node and (hopefully) the resulting thread, is just something convenient to toss to newbies who aren't aware of the issues involved.

Commons Problems with Alternatives

Here are some common reasons not to use alternatives to CGI.pm: Those are some of the biggies. The following is a list of complaints that, while not directly related to the "hand-rolled" problem, tend to crop up in the code of those who insist upon doing it themselves.

Related Problems

If you want instant verification of this stuff, use Super Search and search for CONTENT_LENGTH in the text of articles. Not all are applicable, but there are some real doozies out there. Here's my favorite:
use CGI qw/:standard/; read(STDIN, $formdata, $ENV{'CONTENT_LENGTH'}); @pairs = split(/\&/, $formdata); foreach $pair (@pairs){ ($name, $value) = split(/=/, $pair); $value =~ tr/+/ /; $value =~ s/%0D%0A/\n/g; $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; $FORM{$name} = $value; }
This person is using CGI.pm but still (incorrectly) hand-parsing the data.

Benefits of CGI.pm

No sense in showing you the stick if I don't bother with the carrot.

Cheers,
Ovid

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.


Footnotes

  1. Yeah, I know that I have an online CGI course, also. The course that japhy is preparing seems to be much more of a rigorous analysis than mine. Mine is targeted at a different audience. (read: japhy's Perl is way better than mine so I pander to the masses :-).
  2. Why should that be a security hole? If you only have one file with a given name, you won't be separating them with null bytes, right? Not necessarily. A wily cracker can simply add another parameter with the same name and your script will politely add a null byte for you. Of course, proper taint checking will stop this, but so will using CGI.pm.
  3. I don't know who first started the annoying habit of trying to strip out SSI's in the parameter processing routine, but here's the potential benefit: let's say you let users sign up at your site and create a home page. You use CGI to capture their home page data and write it to an HTML file, but you don't want to allow people to run SSIs (a huge security hole, if you're configured wrong). This code will strip out SSIs, HTML comments, and everything between them if you have more than one. See Death to Dot Star! if you're unfamiliar with that issue.

Replies are listed 'Best First'.
(ichimunki) re: use CGI or die;
by ichimunki (Priest) on Jan 11, 2001 at 17:41 UTC
    Okay. I need a place to vent on the topic of CGI and this seems like the perfect place.

    I've been doing some research on HTML4, parsing HTML, and related topics. I've recently been trying to build a browser in Perl (and yes, I use the modules when I know about them).

    As the very first parsing I did, I grabbed all the <h1> to <h6> tags. These tags, when used correctly, should give a good outline of what's on the page. But guess what? I checked a lot of major sites, like search engines, news sites, then some great discussion sites that are success stories for Perl, then a few random Monk home pages. Almost nobody uses these tags. I thought there was a problem with my program! Everybody is using the <font> tag instead of the classic header notations.

    Then I got curious, so I headed to an HTML validator. I checked all the same pages again. I found two pages that were even close to "valid" compared to the standard. And one of those was w3's own home page. The other was missing a single alt tag on a gif.

    So here's my sore spot. Using CGI.pm is obviously recommended. I can't think of a single reason not to use something that comes with every default install of Perl. That would be like writing foreach loops to perform an action on list elements instead of using map. But even those that I can't imagine are not using CGI.pm (slashdot or perlmonks, for instance) do not generate valid HTML. Not even close according to the error report.

    While I've seen plenty of loud complaints when people roll their own form parsers, I am not seeing those same loud complaints when people (mis)use CGI to generate suboptional HTML or who use it to parse the forms, but then completely ignore it for generating HTML. It would appear that even those people who are using CGI.pm can't count on it to put in some default alt tag for their gifs when they forget-- it creates correctly formed HTML that may be gibberish according to the standards.

    The module might as well be CGI::ParseForms and skip all the HTML building routines, for the ways it seems to be used in the wild. And frankly, given how much trouble I have with fonts that get too small, or pages that are completely unreadable in text-only mode (yes, I like to browse in Lynx sometimes just to get away from all the image rendering issues and time wasted waiting for them download over the modem), I'd like to see us make stronger, more frequent recommendations to use CGI for building HTML and then to remember that using it is no guarantee of perfect HTML either.

    Some of the above is altered and Update: based on responses below, I'm not sure what I said that muddied my point. What I'm saying is simple. Feel free to keep harping away on the poor souls who roll their own parsing routines instead of using CGI.pm. But please, consider applying the same critical eye to people who use only 5% of the functionality of the module and continue to hand code HTML (often hard coding large chunks of it into their scripts), or who use the module to create crummy HTML by subverting the fact that while it writes well-formed HTML it does not validate tags, attributes, or block/inline nesting.

      The problem is that many years ago it was decided by the powers that be that browsers would be lenient towards bad HTML. This is generally seen as a Bad Thing. As you've seen, the vast majority of the web is now made up of invalid HTML.

      Using the HTML shortcuts in CGI.pm helps in one way as a construction like:

      ul(li([1 .. 10]));

      will at least be well-formed, unfortunately it doesn't prevent you doing something like:

      p(font({size=>'larger', color=>'red'}, 'Heading'));

      instead of

      h1('heading');

      and using CSS to handle the appearance.

      I haven't looked at a new version of CGI.pm for some time, but I'm hoping that it either has or will soon have an XHTML mode, but that still won't stop people from Doing The Wrong Thing :( You can't get away from the fact that it's the web page author's responsibility to create valid HTML.

      The only option is for browsers to suddenly stop working on invalid X?HTML, but the chances of that happening are appoximately zero.

      Dave...
      (who tries to validate all of his web pages, but admits that a few errors do creep in)

      --
      <http://www.dave.org.uk>

      "Perl makes the fun jobs fun
      and the boring jobs bearable" - me

        Headings aren't in there to give you a convenient containter for random style markup. Headings are in the the HTML standard to allow you to break a document up into sections in an orderly fashion. People just abuse them to get nice giant bold text. If you want Nice Giant Bold Text and the text isn't a series of headers, please do use font or css on a <span> tag. H1-H6 tags should be used for headings so that they can be pulled into an outline of a document. Otherwise you just have found another way to abuse HTML.

        --
        $you = new YOU;
        honk() if $you->love(perl)

        > The only option is for browsers to suddenly stop
        > working on invalid X?HTML, but the chances of that
        > happening are appoximately zero.

        Suddenly? No. Most of the web is still a non-wellformed mixture of HTML3, HTML4, and imaginary tags made up by specific browsers. However, current browsers do choke on non-wellformed markup if it is served with a content-type of text/xml, and that's a first step. As things like XSLT and RDF start to catch on, sites that want to harness the value of those things will have to be redone in wellformed XML, and that's that. (They won't necessarily have to provide and validate against Schemata, but we have to start someplace.)

        Incidentally, if CGI.pm is now improved to the point of being capable of producing anything that remotely resembles XHTML, maybe I should have another look at it; I've been avoiding it because of two things, and one was the execrable state of its output. If that has been shaped up, maybe the other thing (the tendency to obfuscate the Perl code) has been improved too, since I looked at it (which has been a bit), and I should have a second look.

         --jonadab

      The module might as well be CGI::ParseForms and skip all the HTML building routines, for the ways it seems to be used in the wild. And frankly, given how much trouble I have with fonts that get too small, or pages that are completely unreadable in text-only mode (yes, I like to browse in Lynx sometimes just to get away from all the image rendering issues and time wasted waiting for them download over the modem), I'd like to see us make stronger, more frequent recommendations to use CGI for building HTML and then to remember that using it is no guarantee of perfect HTML either.
      There's one very nice thing about CGI.pm that hasn't been yet pointed out: if you had been generating valid HTML all along using the shortcuts (as opposed to "print"-ing your own), then in the most recent releases of CGI.pm, you are now generating valid XHTML! Yes yes yes! Thank you Lincoln!

      -- Randal L. Schwartz, Perl hacker

      Try this:
      use CGI qw( glark yurp ); my $q= CGI->new(); print $q->h1( "This is not really HTML" ); print glark( { flinge=>"worz", plutch=>"erff" } ); print yurp( { huid=>"queez", urst=>"hmmph" } ); print $q->font( { crypet=>"swoom", whalk=>"47" } );
      which produces
      <H1>This is not really HTML</H1> <GLARK FLINGE="worz" PLUTCH="erff"> <YURP URST="hmmph" HUID="queez"> <FONT WHALK="47" CRYPET="swoom">

        See! Stein is embracing and extending HTML! He's eeevil! ;-)

        (Score -1, Off-topic):
        Does anyone already have the address I_learned_to_read@hotmail.com ? It might be funny to have it.

        Or:

        They_still_use_BSD@hotmail.com :-)
      I hate hammers and screwdrivers. Well, not exactly hate them because they come with every toolbox it seems but I want to complain about how people use them. People are always using these tools to build things that are dangerous. The planks on the deck are loose, the shelves in the bookcase are wobbly, people try and open cans with them, etc. We should change their names to "nail-driver" and "threaded-metal-cylinder-turner" until we can fix these tools to alert the user when they are using them incorrectly or at least get the hammer to countersink, putty, and sand.

      No offense ichimunki, I'm in your camp on this, I just think that you shot at the wrong criminal. People write shitty HTML with any tool, CGI can't make things worse and frequently makes things better.

      That is a ++ to ichimunki in case I was equally unclear =)

      --
      $you = new YOU;
      honk() if $you->love(perl)

Re: use CGI or die;
by gildir (Pilgrim) on Jan 11, 2001 at 16:51 UTC
    For Apache's mod_perl there is excelenet CGI.pm alternative: libapreq. But this is only available under mod_perl, as it uses Apache's internals.
      I totally agree with you gildir. This is the method I prefer, it's a lot faster, we tested this through our module site, the site is quick to load because it doesn't have to access CGI.pm everytime. Apache::Request and Apache::Cookie are great too. =D
Re: use CGI or die;
by Maclir (Curate) on Jan 11, 2001 at 04:50 UTC
    There are also other tools / modules available. I have used Embperl http://perl.apache.org/embperl/index.html, which uses CGI.pm under the covers. I am not sure about HTML::Mason, but I would not be surprised it if also uses CGI.pm too. There are probably other HTML generation / templating / web site generation tools that use CGI.pm
      Just as an FYI, HTM::Mason does provide access to the CGI.pm.

      You are advised to not access the structure directly, even though you can. Mason provides access through object references.

      Also, you are able to take advantage of the HTML constructs as documented in the Mason FAQ here

      I'm inclined to believe that using the CGI constructs is probably a "good thing" as you get the side benefit, as merlyn states: Re: Re: use CGI or die;, of CGI.pm spitting out XHTML.

      My favourite for embedding Perl into *ML is Apache::ASP. The homepage is here. Last time I checked, it relied on CGI.pm for file upload and allowed to mix freely CGI.pm and Apache::ASP calls in the same page.

      I find the Active Server Pages model (with Perl as a programming language) quite useful when writing Web applications.

      -- TMTOWTDI

Re: use CGI or die;
by ColonelPanic (Friar) on Jan 12, 2001 at 02:08 UTC
    Another huge debugging benefit of CGI is CGI::Carp
    use CGI::Carp qw(fatalsToBrowser);
    This is invaluable for figuring out a CGI problem. Not only do you see errors, but you can easily insert your own die(); statements to see what's going on, instead of printing your own header and HTML in several lines.

      But you don't need to be using CGI in order to make use of CGI::Carp.

      --
      <http://www.dave.org.uk>

      "Perl makes the fun jobs fun
      and the boring jobs bearable" - me

      I always use strict, -w and fatalsToBrowser when developing, but today this caused me a huge headache in a simple script (with no syntax errors) that merely reads a file, formats the contents, and displays the html.

      It runs fine from the command line (with a few uninitialized value warnings) and could save the output with

      perl foo.pl > foo.html
      but it would just whirl and give no output via CGI. I found that the code looks and behaves perfectly via CGI unless both -w and fatalsToBrowser are enabled! Shut either one off (leaving the other on) and it works fine.

      I found a certain loop in the program that seems to cause this by throwing pairs of =cut around, but the loop seems mundane and similar to the others.

      Is this odd interplay between -w and fatalsToBrowser documented?

      Update: I said i'd node the entire script to craft in a day or two but am finding it difficult to abstract a simplified example. So i'll just suggest that if your error-free cgi script mysteriously hangs, turning off either -w or fatalsToBrowser may help.

      dws - Try this for a good html ™:

      &#153;
        Show us the code!

        Enquiring Monks Want to Know™

Re: use CGI or die;
by Anonymous Monk on Feb 04, 2013 at 11:05 UTC