Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Re: subparseform.lib

by Xxaxx (Monk)
on Apr 28, 2001 at 03:23 UTC ( #76304=perlquestion: print w/replies, xml ) Need Help??

Xxaxx has asked for the wisdom of the Perl Monks concerning the following question:

I'm basically self-taught in the world of Perl through the help of some excellent books (thanks for the recommendations) and through reading q&a on sites such as this and the newsgroups.

The significance of the above statement is that I haven't been hanging around on the webdeveloper's street corners -- at least until recently. Thanks to the world of spam I'm now receiving several unwanted email tutorials from web-experts -- lucky me :-(

Well, it has come to my attention that most of the webdeveloper's who are not using Matt's scripts are using a thing called "subparseform.lib" to parse input from webforms.

I took a look at this lib and was more than a bit annoyed at something this simplistic becoming the standard. I guess after Matt's similiar effect I shouldn't be surprise but...

In any case I decided to start contacting the so-called teachers training the new webdevelopers and giving them a slightly more secure form of their precious little 'subparseform.lib'. I figured getting them to convert to CGI was a bit too up hill a battle.

To that end I've added what I could to the existing 'subparseform.lib'. Before publishing this to the tutors as a better form of what they have I'm hoping to run it by the good monks here.

I know it's not CGI. But since I'm just trying to replace near total garbage with something that looks like the original garbage but might be a bit more secure this is the current tack.

Let me know if you see any glaring errors I've missed. Or better ways to handle this short of a total CGI conversion.


sub Parse_Form { use vars ('%formdata'); my @pairs = (); if ($ENV{'REQUEST_METHOD'} eq 'GET') { @pairs = split(/&/, $ENV{'QUERY_STRING'}); } elsif ($ENV{'REQUEST_METHOD'} eq 'POST') { read (STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); @pairs = split(/&/, $buffer); #### Removed as security risk #### Use hidden vars in stead #### don't mix methods if ($ENV{'QUERY_STRING'}) { #### don't mix methods @getpairs =split(/&/, $ENV{'QUERY_ST +RING'}); #### don't mix methods push(@pairs,@getpairs); #### don't mix methods } } else { print "Content-type: text/html\n\n"; print "<P>Use Post or Get"; } foreach my $pair (@pairs) { my ($key, $value) = split (/=/, $pair); $key =~ tr/+/ /; $key =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; $value =~ tr/+/ /; $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; ###=== Begin Security addition ====================== ## REMOVE poison NULL $key =~ s/\0//g; $value =~ s/\0//g; ## Clean characters to remove weird stuff my $allowedCHARS = 'a-zA-Z0-9\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\ +;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~'; $key =~ s/[^$allowedCHARS]//gi; $value =~ s/[^$allowedCHARS]//gi; $key =~s/<!--(.|\n)*-->//g; ###=== End Security addition ======================== ###=== Begin Cosmetic/Functionality addition ======== ## REMOVE LEADING BLANKS $key =~ s/^\s*//; ## REMOVE TRAILING BLANKS $key =~ s/\s*$//; ###=== End Cosmetic/Functionality addition ========== $value =~s/<!--(.|\n)*-->//g; if ($formdata{$key}) { $formdata{$key} .= ", $value"; } else { $formdata{$key} = $value; } } } return 1;

Replies are listed 'Best First'.
(Ovid - cargo-cult CGI) Re: Re: subparseform.lib
by Ovid (Cardinal) on Apr 28, 2001 at 03:58 UTC
    Well, being kind of a "CGI guy", you might expect that I would take the time to look over this code. As usual, it has many issues. You can see Lesson 2 of my online CGI course for more information, or check out use CGI or die;. You've made most of the common errors.
    if ($ENV{'REQUEST_METHOD'} eq 'GET') { @pairs = split(/&/, $ENV{'QUERY_STRING'}); } elsif ($ENV{'REQUEST_METHOD'} eq 'POST') { read (STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); @pairs = split(/&/, $buffer);
    Are you aware that the semicolon ';' is an alternate delimeter for name/value pairs? Also, what happens if there is a problem with $ENV{'CONTENT_LENGTH'} not matching the actual data length? You need to test for that or risk occassionally having corrupted data. Also, you may also want a test if the read is successful.
    ## REMOVE poison NULL $key =~ s/\0//g; $value =~ s/\0//g; ## Clean characters to remove weird stuff my $allowedCHARS = 'a-zA-Z0-9\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\ +;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~'; $key =~ s/[^$allowedCHARS]//gi; $value =~ s/[^$allowedCHARS]//gi;
    I see where you are going with removing ASCII zero and "dangerous" characters, but this limits the flexibility of your code. What if someone really needs these characters to be uploaded? What are their options?
    • Strip out this part of your code, which could affect the code of other programmers (this is an example of non-orthogonal code -- a big NO NO).
    • Replicate this code elsewhere and strip out the regex. Now we need to synchronize two virtually identical sections of code. That's another big NO NO!
    • Use or CGI::Lite. That's the proper way to go.
    $key =~s/<!--(.|\n)*-->//g;
    Aaagh!!!! I get tired of seeing this. The real purpose of this is to strip out SSIs from incoming data, in case this data gets written out to a Weg page that someone else might call up. The reality is, it's a horrible regex (dot star, alternation on single characters, and will slurp up multiple SSI's or HTML Comments and anything in between. Plus, what if someone wants HTML comments or SSI's to be submitted? Again, you have the non-orthogonal code issue. See list above.
    ###=== Begin Cosmetic/Functionality addition ======== ## REMOVE LEADING BLANKS $key =~ s/^\s*//; ## REMOVE TRAILING BLANKS $key =~ s/\s*$//; ###=== End Cosmetic/Functionality addition ==========
    No. What if someone wants the extra whitespace? Non-orthogonal.
    if ($formdata{$key}) { $formdata{$key} .= ", $value"; } else { $formdata{$key} = $value; }
    The intent of your code is to have them do something like this:
    my @values = split /, /, $formdata{ $somekey };
    Hmmm... what happens if some enters a value with a comma and space? That's right, they think they have an extra value.

    Of course, your code doesn't handle file uploads, either, but that's a whole 'nother ball of wax.

    I'm sorry, but this is terrible cargo-cult code. Your heart is in the right place, but this code is terrible.


    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      Thanks for the heads up.

      As you may have noticed by my intro, most of the above code is not mine. And, I did miss several rather glaring misconceptions in the original. Mucho thanks for pointing those out. Doing a code audit of another's work is something very new to me.

      This "lib" appears to be used almost exclusively by folks trying to accept simple forms information. Which means almost zero cleanup happens after the fact. Hence anything passed back from this routine is used almost as is.

      And to tell you the truth I never expected anyone opening an upload (enctype="multipart/form-data") on the web would use such a lame lib as subparseform.lib. Surely one would jump to more sophisticated scripts by that time.

      My original intention was to see if a plug-n-play replacement for subparseform.lib could be made which would not break their existing scripts relying upon the lib.

      Is this helpling the weak stay weak? I'm not sure. I was just so annoyed to see the original subparseform.lib proliferating even further onto the net with such lame security.

      Maybe there's a way to replace subparseform.lib with a enabled version that also creates the %formdata hash. This might give them a version which will still feed into their existing code AND gives them the all important jump start into using

      I'm hoping something can be done short of leave the security hole ladden lib alone and leave these misguided webdevelopers to their own devices.

      I'll let ya know. And thanks again for the input.


        The following is just a rough stab at what you are looking for. The problem with handling the security in the "Parse_Form" routine is that security considerations will differ from application to application. Trying to embed the security directly into a form-parsing routine doesn't work. As security concerns change from application to application, the parsing routine would have to be adjusted. You either have multiple programs affected by the change or multiple copies of the form parsing routine.

        Because I understand where you're heading with this, I'll make a suggestion. I do not recommend the following code, but perhaps it would let people new to CGI programming see how easy it is to use Please note that the following code is untested.

        sub Parse_Form { use CGI qw/:standard/; foreach my $key ( param() ) { my @vals = param( $key ); if ( scalar @vals == 1 ) { $formdata{ $key } = $vals[0]; } elsif ( ! @vals ) { $formdata{ $key } = undef; } else { # This should be adjusted as necessary. $formdata{ $key } = \@vals; } } }
        Actually, one can get all of the form data with a map:
        %formdata = map { $_, [ param( $_ ) ] } param;

        The main problem there is how one gets a single value for the param "foo":

        my $val = $formdata{ 'foo' }->[0];

        I think too many programmers would balk at the awkward syntax.

        Note how "multiple values" are handled in my original routine. Properly, multiple values that map to a single key should be represented in an array. Unfortunately, as you've discovered, many insist upon joining with a comma/space or with an ASCII zero (\0). How that final one is handled depends upon how to preserve current functionality of their programs.

        Xxaxx asked:

        Is this helping the weak stay weak?

        Yes. Personally, I have strong objections to teaching people bad coding practices and then gradually bringing them up to speed. They wind up with many bad habits to unlearn. The reality, however, is that this is often the situation we have to deal with.

        I will confess that I am unsure about how you're going about this, but that's probably because I haven't given it a lot of thought. If you must take this approach, I think the snippet above is probably a good start. If you want to more accurately duplicate the functionality of the original scripts, perhaps you should consider adding a standard HTML parsing module to look for SSIs or HTML comments. Then, you can strip them out (don't do a regex - they're trickier than you might think). However, this is going back to forcing your notion of security on them. That's not good practice, as mentioned above.

        Good luck!


        Update: I forgot to mention some issues with your approach to security:

        my $allowedCHARS = 'a-zA-Z0-9\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\ +?\@\[\\\]\^\_\`\{\|\}\~';
        What the heck is that? Well, lets clean it up a bit.
        my $allowedCHARS = 'a-zA-Z0-9' . quotemeta( '!"#$%&\'()*+,-./:;<=>?@[\ +]^_`{|}~' );

        Well, that clears it up and shows some serious problems. If you are going to hold their hand, hoping they don't get burnt, why even allow them to play with firecrackers? I see escapes, backticks, pipes, and dots, all of which are extremely common sources of security holes. Depending upon what is done with the rest of the characters (and on what OS they are running), there are still plenty of other issues here. I do like the fact that you state specifically what is allowed -- that's much better than stating what isn't allowed -- but it's still extremely permissive, depending upon what you want to do.

        Security concerns simply vary too much from site to site. It is not possible to address them all in a form parsing routine. That's a limitation that you'll just have to accept and it is incumbent upon all of us to raise the awareness of these issues.

        Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

        FYI: has a backwards compatability layer to which can be invoked with
        use CGI; CGI::ReadParse();
        This creates the %in hash that (a deprecated lib) folks liked to see. If you still insist on making your own POST/GET parser, why not just cut and paste the CGI parsing routine found in "sub new{...}" and tweak as necessary?

        But why do you wish to propagate bad code? Just yell at them to use CGI! There is NO excuse!

        AgentM Systems nor Nasca Enterprises nor Bone::Easy nor Macperl is responsible for the comments made by AgentM. Remember, you can build any logical system with NOR.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://76304]
Approved by root
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (5)
As of 2021-02-26 07:49 GMT
Find Nodes?
    Voting Booth?

    No recent polls found