http://qs321.pair.com?node_id=11133815


in reply to When not to use taint mode

A long time ago I first came across taint mode and decided it is far too difficult to understand...I've since looked again and it doesn't appear anything like as mystical as it once did. That's what happens when one improves of course.

Taint mode is actually quite simple:

First, data fetched from external sources is marked as tainted. The exact list is found in perlsec:

All command line arguments, environment variables, locale information (see perllocale), results of certain system calls (readdir(), readlink(), the variable of shmread(), the messages returned by msgrcv(), the password, gcos and shell fields returned by the getpwxxx() calls), and all file input are marked as "tainted".

Expect that list to grow over time.

Second, passing tainted data to some critical functions causes a runtime error. The description is again found in perlsec:

Tainted data may not be used directly or indirectly in any command that invokes a sub-shell, nor in any command that modifies files, directories, or processes, with the following exceptions:

  • Arguments to print and syswrite are not checked for taintedness.
  • Symbolic methods
    $obj->$method(@args);
    and symbolic sub references
    &{$foo}(@args); $foo->(@args);
    are not checked for taintedness. This requires extra carefulness unless you want external data to affect your control flow. Unless you carefully limit what these symbolic values are, people are able to call functions outside your Perl code, such as POSIX::system, in which case they are able to run arbitrary external code.
  • Hash keys are never tainted.

Third, almost all operations on tainted data will also taint the result. The exceptions are the ternary condition opperator ?: (see perlipc for the rationale), and references to subpatterns from a regular expression match. Again quoting perlpic:

But testing for taintedness gets you only so far. Sometimes you have just to clear your data's taintedness. Values may be untainted by using them as keys in a hash; otherwise the only way to bypass the tainting mechanism is by referencing subpatterns from a regular expression match. Perl presumes that if you reference a substring using $1, $2, etc. in a non-tainting pattern, that you knew what you were doing when you wrote that pattern. That means using a bit of thought--don't just blindly untaint anything, or you defeat the entire mechanism. It's better to verify that the variable has only good characters (for certain values of "good") rather than checking whether it has any bad characters. That's because it's far too easy to miss bad characters that you never thought of.

So, these three facts combined prevent you from passing untrustworthy data to functions relevant to system integrity. You explicitly have to check all relevant incoming data. If you forget to check at one point, your script will die instead of doing dangerous things.

Some modules handle also taint mode. DBI optionally taints most data read from the database, and optionally checks for tainted parameters. CGI explicitly taints all multi-part form data, all other input is already tainted by perl. File::Find can automatically untaint directory names. Other modules may or may not handle taint mode.

Taint mode is usually enabled by passing the -T argument to the interpreter. Perl automatically enables taint mode when it detects it is running with differing real and effective user or group IDs. perlipc explains when to enable taint mode:

[Taint mode] is strongly suggested for server programs and any program run on behalf of someone else, such as a CGI script.

It does not hurt to enable taint mode in more situations. Running a perl script as root is a strong indicator that you want to enable taint mode, as is almost any script handling data from the network or other non-trustworthy sources. If in doubt, enable taint mode.

Taint mode for DBI is an interesting question: If only your script interacts with the database, and if you use placeholders everywhere, the database could be considered trustworthy. I would not trust the database if it is reachable from the network. Again, if in doubt, enable taint mode and make DBI taint data from the database and check DBI method calls for tainted data (attributes Taint, TaintIn, TaintOut).

Update:

Note that $ENV{'PATH'} is tainted, and so any attempt to run external programs will fail with Insecure $ENV{PATH}. You need to set $ENV{'PATH'} to an untainted value, and you should delete some other critial environment variables. perlipc recommends:

delete @ENV{qw(IFS CDPATH ENV BASH_ENV)}; # Make %ENV safer

A sane value for $ENV{'PATH'} is usally /bin:/usr/bin, but that depends on the target system.

Also note that if use locale is in effect, the pattern \w depends on the locale, the locale data is tainted, any regular expression containing \w is tainted, and thus untainting using a regular expression containing \w won't work.

And finally the big fat warning: Taint mode does not magically make your script secure. It just prevents some common errors. You still need to think about what your code does. You need to be paranoid about the data you are working with. Taint mode just helps you find common problems.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)