http://qs321.pair.com?node_id=799797

grashoper has asked for the wisdom of the Perl Monks concerning the following question:

I was trying to come up with a regex to fix er block(sql injection) not sure how to write this properly. should I be doing this in the form validation code (since this is a loginbox() process or is it better practice to fix it in the sql itself? I am really horrendous with regexes
#want/need to add something to $user to test if its invalid #input #next if $User(/^"*^';&<>()/); #$User.'.'.'; #$Response->Write("Invalid Input"); my $sql = "SELECT Name, UserID, Passwd, Class FROM Users WHERE UserID='$User';";"

Replies are listed 'Best First'.
Re: regex challenged
by moritz (Cardinal) on Oct 07, 2009 at 19:55 UTC
    You should just use placeholders and avoid escaping of the values totally:
    my $sth = $dbh->prepare('SELECT Name, UserID, Passwd, Class FROM Users WHERE UserID=?'); $sth->execute($User);

    The database substitutes the question mark with the value from $User without interpreting it as SQL, so you're as safe as you can get that way.

    When you print out the user name again someday, you have to HTML-Escape it. Good template systems like HTML::Template::Compiled can do that by default with the default_escape => 'HTML' option.

    Perl 6 - links to (nearly) everything that is Perl 6.
Re: regex challenged
by jrtayloriv (Pilgrim) on Oct 07, 2009 at 20:48 UTC
Re: regex challenged
by redgreen (Priest) on Oct 07, 2009 at 20:21 UTC

    Besides using placeholders as moritz said, it is always easier to test for characters you want allowed (a known set) in a regex, rather then trying to come up with what isn't allowed (an unknown large set).

    if ( $User =~ /^[a-z0-9]+$/i ) { # Allowed }
Re: regex challenged
by halfcountplus (Hermit) on Oct 07, 2009 at 20:14 UTC
    What Moritz says about using a placeholder is definitely the way to go vis. avoiding SQL injection.

    If you want a simple way to make it HTML safe, you can do that before you put it in the DB or after, when you take it out and want to use it in a page (depending what else is done with the data). There are lots of modules, etc, for doing this but the major issue is the < and >, and ' if you use javascript:
    $string =~ s/</&lt;/g; $string =~ s/>/&gt;/g; $string =~ s/'/&#39;/g;
    Hopefully you recognize what that is for. There is a chart of all HTML "escape" codes at http://www.lookuptables.com/
      How is your advice regarding HTML safety poor? Let me count the ways:
      1. It's best to clean up your data both before and after. Before storing the data, you need to clean it up enough to make it safe to store. (In many cases, this can be skipped in favor of using parametrized queries (placeholders).)

        Cleanup for display needs to be done immediately prior to display because, if you only clean up the HTML before storing it and a new exploit is discovered next week, the data already in your database may still contain that exploit. Doing this cleanup on display is the only way to ensure that all current cleanup will be performed on older data. (Pre-cleaning before storage isn't a bad thing, but it is not sufficient by itself.)

      2. < and > are major issues even if you don't use javascript. <iframe src='http://rogue.com/path/to/exploit.html'></iframe>, for example.
      3. Your set of suggested regexes take a blacklisting approach ("block these three specific characters") which, by its very nature, is susceptible to letting potential dangers slip through. It's much better to go with whitelisting ("this set of characters are known (or at least believed) to be safe; block everything else") in the general case or to use a proper HTML escaping function in the specific case of handling HTML output.