http://qs321.pair.com?node_id=417625


in reply to Re: Security techniques every programmer should know
in thread Security techniques every programmer should know

Your code will call anything with whitespace an unsafe string. While that's much better than no checking, how about:

$string =~ s/!([\w\s]+)//; ##add other allowed chars as needed
That will sanitize all strings to contain only numbers, digits, the underscore and whitespace. A more complete regex (which would still not include unicode or international chars) would be:
$string =~ s/!([\w\s\!\@\#\$\%\^\&\*\(\)\\\`\~\-\+\=\,\.]+)//;
(Yes, there's more escaping there than strictly necessary.) Suddenly, that transliteration is looking a lot easier to maintain. If your allowed set is "everything but nulls and control chars", then you're better off explicitly excluding the known control-char set.

Denying all, then allowing is a good general rule of thumb. But, in this case, the "dangerous" items are a fixed set while the "safe" items are much more variable -- so it makes sense to simply remove that which is dangerous.

Update=> Aristotle reminded me that, as \s includes \n, these regexes will not strip newlines; that means strings sanitized with these will be unsafe if executed with a shell (e.g. system("$string");). This further shows that inclusion-matching isn't as good, in this case, as merely stripping "bad" data out.

Anima Legato
.oO all things connect through the motion of the mind

Replies are listed 'Best First'.
Re^3: Security techniques every programmer should know
by kutsu (Priest) on Dec 27, 2004 at 21:36 UTC

    \w matches different things depending on your locale. If you have a German locale, for instance, it will match ß.

    The danger of using perl's shortcut character classes, as was pointed out to me by DrHyde.

    "Cogito cogito ergo cogito sum - I think that I think, therefore I think that I am." Ambrose Bierce

Re^3: Security techniques every programmer should know
by Aristotle (Chancellor) on Dec 28, 2004 at 23:41 UTC

    Are you sure you want to use \s? That includes \n, you know.

    Makeshifts last the longest.