Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Validating incoming CGI form data

by Anonymous Monk
on May 02, 2003 at 12:17 UTC ( [id://254971]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi All

I have wrote a script that is accepting input from a form that people fill out on my website. I am running with Taint, and not sure if i even need to go to all of this trouble to validate people's input and scan it for nasties. It all looks very messy. Here is a snippet:

if ($fullname !~ /^[-.\w\s]{2,20}$/) { $message = $message.'<p>Please Enter Your Full Name Between 2- +20 Characters With No Symbols.</p>'; $found_err = 1; } if ($username !~ /^[a-z][a-z0-9-._]{1,15}$/) { $message = $message.'<p>Please Enter Your Desired Username In +Lower Case Only With No Symbols And Between 2-16 Characters.</p>'; $found_err = 1; } if ($username =~ m/^admin|administrator|accounts|support|postmaster|we +bmaster| |technical|billing|sales|purchase|buy|misuse|assistance|mail$/ +) { $message = $message.'<p>Sorry, The Chosen Username Is Containe +d Within Our Dictionary. Please Choose Another Username.</p>'; $found_err = 1; } if ($password !~ /^[a-zA-Z0-9-._]{5,29}$/) { $message = $message.'<p>Please Enter A Password Between 6-30 C +haracters With No Symbols. This Can Be Changed Later.</p>'; $found_err = 1; } if ($mothers !~ /^[-.\w\s]{2,20}$/) { $message = $message.'<p>Please Enter Your Mothers Maiden Name +Between 2-20 Characters With No Symbols.</p>'; $found_err = 1; } if ($email !~ /^([a-zA-Z0-9_\.\-])+\@(([a-zA-Z0-9\-])+\.)+([a-zA-Z0-9] +)+$/i) { $message = $message.'<p>Please Enter A Valid E-Mail Address. Your +User Information Will Be Sent Here.</p>'; $found_err = 1; } if ($referral !~ /^[-.\w\s]{3,60}$/) { $message = $message.'<p>Please Enter Where You Heard About XXX +. This Must Be Between 3-60 Characters Long With No Symbols.</p>'; $found_err = 1; } if ($conditions !~ m/^Yes$/) { $message = $message.'<p>Please Accept The Terms And Conditions + To Register.</p>'; $found_err = 1; } if ($quota !~ m/^0|2|4|6|8|10|12|14|16|18|20|30|40|50|60|70|80|90|100$ +/) { $message = $message.'<p>Your Chosen Mailbox Quota Does Not App +ear To Be A Valid Option. Contact Support For Assistance.</p>'; $found_err = 1; } if ($spam !~ m/^10|0$/) { $message = $message.'<p>Your Chosen Spam Filtering Option Does + Not Appear To Be Vaild. Contact Support For Assistance.</p>'; $found_err = 1; } if ($antivirus !~ m/^10|0$/) { $message = $message.'<p>Your Chosen Virus Filtering Option Doe +s Not Appear To Be Vaild. Contact Support For Assistance.</p>'; $found_err = 1; } if ($bandwidth !~ m/^15|0$/) { $message = $message.'<p>Your Chosen Bandwidth Option Does Not +Appear To Be Vaild. Contact Support For Assistance.</p>'; $found_err = 1; } if ($support !~ m/^2|0$/) { $message = $message.'<p>Your Chosen Support Option Does Not Ap +pear To Be Vaild. Contact Support For Assistance.</p>'; $found_err = 1; } if ($found_err) { &PrintError; } sub PrintError { print "Content-type: text/html\n\n"; print $message; exit 0; return 1; }

edited: Fri May 2 16:16:19 2003 by jeffa - title change (was: There MUST be a better way to validate?)

Replies are listed 'Best First'.
Re: Validating incoming CGI form data
by robartes (Priest) on May 02, 2003 at 12:32 UTC
    I like to use a hash that links regular expressions and parameters. Like so:
    use strict; use CGI; my $query=CGI->new(); my %laundry= ( 'name' => '(\w+)', 'number' => '(\d+)', 'choice' => ([ABC]), ); my %laundered_params; foreach my $param (keys %laundry) { die "Parameter $param not present\n" unless my $input=$cgi->param($p +aram); my $regexp=$laundry{$param}; $input =~ /$regexp/; die "Invalid input for $param\n" unless $laundered_params{$param}=$1 +; }
    The above code is untested and assumes the use of CGI.

    CU
    Robartes-

      Or maybe like this:
      while(my ($param, $rx) = each %laundry) { local $_ = $cgi->param($param) or die "Parameter $param not present\n"; ($laundered_params{$param}) = /$rx/; or die "Invalid input for $param\n"; }
      Note that both snippets will behave in probably undesired ways if any of the parameters have multiple values.

      Makeshifts last the longest.

Re: Validating incoming CGI form data
by zakb (Pilgrim) on May 02, 2003 at 12:35 UTC

    Your validation tests look reasonable. One optimisation would be to use an array for the error messages, like so:

    # untested, change to suit my @messages; if ( ....test.... ) { push @messages, "The error message"; } # more tests PrintError if @messages; sub PrintError { # any initial html stuff, or use a template print join('<br>', @messages); }

    ...which means you don't need to maintain a flag.

    Looking on CPAN, I see CGI::Validate which appears to a a Getopt style validator and Params::Validate which may be useful.

    Update:Podmaster suggested Data::FormValidator, which is what I was trying to find.

    Update2: I left the & in there because that's what the supplicant had - now removed. For reasons why not to do this, see perlsub. Basically, with the &, the argument list to a sub is optional and you will get the @_ array visible at the time of calling in the subroutine.

Re: Validating incoming CGI form data
by Joost (Canon) on May 02, 2003 at 12:32 UTC
    Just a couple of suggestions:

    Don't use &subroutine; unless you're sure that you want to do that. You very likely want subroutine(); instead.

    As for cleaning the code a litte: you could do something like:

    Warning: untested code ahead

    my %opts = ( bandwidth => { err => 'Your Chosen Bandwidth Option Does Not Appear To Be Valid. Contact Support For Assistance.', match => qr/^15|0$/, }, support => { err => 'Your Chosen Support Option Does Not Appear To Be Valid. Contact Support For Assistance.', match => qr/^2|0$/, } # etc etc. ); my $error = ''; while (my ($name,$check) = each(%opts)) { unless ($q->param($name) =~ $check->{match}) { $error .= "<p>$check->{err}</p>\n"; } } if ($error) { print_error($error); }
    At least I think it looks a little better :-)
    -- Joost downtime n. The period during which a system is error-free and immune from user input.
Re: Validating incoming CGI form data
by jasonk (Parson) on May 02, 2003 at 14:10 UTC

    From a code standpoint it doesn't look too bad, but from a logical standpoint you have a lot of issues. First you exclude a whole lot of valid email addresses by making assumptions about what characters are allowed in an email address (pretty much anything is allowed on the left side of the @). You have also made some big assumptions about characters that will not appear in your customers names, unless you don't want any customers named O'Hara or O'Malley. From a purely nit-picky point of view, the reserved-username message will probably lead people to believe their username must be a non-dictionary word.


    We're not surrounded, we're in a target-rich environment!
Re: Validating incoming CGI form data
by hardburn (Abbot) on May 02, 2003 at 14:11 UTC

    I used an approach like this in CGI::Search, which could be adapted to your situation CGI.

    A series of subroutines are used to validate the input. The subs take in the data to validate and return a list of three values. The first value is a boolean value of wheather the data validated or not. The second is the data that was validated, but in untained form (see perlsec). The third is a string that can be used as an error message if the data didn't validate.

    To preform the validation, you make a hash-of-arrays with three elements in the array portion. The key of the hash is the name of the field. The zeroth element of the array is the data to validate. The first element is a referance to a validation subroutine. The second element is a boolean value of wheather the given data is required or not. If that second element is false, than the field is allowed to be blank. If true, then the data still has to pass the validator.

    Tieing it all together:

    use CGI qw(:standard); # Validation subroutines defined elsewhere my %DATA = ( field1 => [ param('field1'), \&INTEGER, 1 ], field2 => [ param('field2'), \&EMAIL, 1 ], field3 => [ param('field3'), \&WORD, 0 ], ); sub do_validation { foreach my $key (keys %DATA) { if($DATA{$key}[2]) { my @result = $DATA{$key}[1]->($DATA{$key}[0]); die "Didn't validate: $result[2]" unless $result[0]; } } }

    The advantage of this compared to a regex is flexibility. A subroutine can do whatever checks it wants to the data. For instance, a credit card validator can run Business::CreditCard on the data.

    WWW::Form does something similar, but adds the ablity to put the subrountines in an array. This means that a single peice of data must pass more than one validation sub.

    ----
    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    Note: All code is untested, unless otherwise stated

Re: Validating incoming CGI form data
by CountZero (Bishop) on May 05, 2003 at 20:04 UTC

    A fine distinction should be made between validating the input and untainting it.

    Untaininting should be more concerned with security of your web-site (to make sure, the user does not slip in some executable code or other bad things), whereas validating has more to do with obtaining valid data from the user.

    It seems you are more concerned with validating here and in order to do that you need to make a model of the data you are expecting and check the input against that model (ask yourself: what is a valid date, name, password, ...).

    Whether you do this checking through multiple if tests and regexes or any of the other suggested methods is less important than finding/setting the rules which define valid input.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://254971]
Approved by robartes
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2024-04-20 01:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found