These have all been mentioned numerous times, but many programmers still don't understand the risks of using power tools. Because I think everyone knows why it is important to program with security in mind, I'll just begin without any further introduction.

Know your environment

Perl is written in C and doesn't prevent you from shooting your own feet. This has some very dangerous implications that not every Perl programmer is aware of. Even though the language you use, you still need some basic knowledge of C to create secure programs.

The platform perl runs on is also important. Linux asks for different security measures than Win32. Even the filesystem that is used can be important: are FooBar and foobar the same file, or not? If you program for one specific platform, make it work only on that one.

Read documentation!

Your program will be used by others

Or maybe not. But always assume the worst. If not because use by others is a probable future, then to keep yourself focussed on the important issues.

Security comes first

Your boss may tell you that the first priority is that everything works, but it is your job to tell him he's wrong. If something is wrong, make sure the program dies before more goes wrong. It's better to have a program that does nothing at all than to have a program that does everything that is expected from it and provides a backdoor for evil-doers as a free bonus feature.


Encode and escape!

Knowing your environment includes knowing all the protocols and file formats used. If you are a web programmer, then you should know HTTP, HTML and probably CSS and JavaScript too.

Of every string that comes in, you should know the character set and, more importantly, its encoding. Before doing anything with the data, it's best to convert it to Perl's internal format.

For example, if your input is %-encoded UTF-8, use:

use Encode qw(decode); use URI::Escape qw(uri_unescape); my $string = decode 'utf-8' => uri_unescape $input;

If you don't know exactly how to unescape the incoming data, use a module, like CGI (or its faster equivalent, CGI::Simple) and let that handle it for you.

The reverse is also true. Before outputting a string, make sure it is in the right format. For example, to output the $string we just decoded in an utf-8 encoded HTML-document, we can use:

use Encode qw(encode); use HTML::Entities qw(encode_entities); my $output = encode_entities encode 'utf-8' => $string;

So even though input and output are utf-8, we still explicitly decode and encode it from and to Perl's internal format. If the output was part of a URL, we'd also unescape and then re-escape the data. This is to make sure no strange octet (byte) slips through. Another benefit is that in between, you have a string that normal Perl functions can manipulate without needing to have special facilities to handle a certain character set or encoding.

(Note: Perl's internal format happens to also be utf-8, but you should never assume this. Always explicitly decode and encode!)

If you don't escape properly, your program is prone to injection attacks. These include, but are not limited to:

  • SQL injection
  • HTML/Javascript injection (Cross site scripting, XSS)
  • open injection (to avoid, use 3-arg open, not 2-arg)
  • Shell command line injection
  • SMTP injection (don't let others abuse your machine as a spam gateway!)

Every output format requires its own escaping. Even better than escaping data, though, is preventing interpolation when possible by using placeholders (DBI) or a list variant of a function (system, exec, open). This skips escaping and unescaping by using a more direct mode of communication. If internally it is still implemented as escaping+unescaping (DBI::mysql), at least you know knowledgeable people take care of it.

Null bytes are scary

Several control characters are scary, because they often have special meaning in certain string formats, but the null byte is the most scary of all. In C, a null byte (\0) indicates where a string ends. However, in Perl, it's just a normal character. This has advantages and disadvantages. The disadvantages are more important to be aware of. Many of Perl's functions are implented using C functions, and in general, you can (and SHOULD!) assume they're not removing the null bytes for you.

Suppose you have written a CGI-script that does nothing more than display a page from the current directory. Storing data in the working directory is often a mistake in itself, but for this contrived example, let's ignore that.

#!/usr/bin/perl -w # this is page.cgi use strict; use CGI::Simple; use File::Slurp qw(read_file); my $cgi = CGI::Simple->new; my $page = $cgi->param('page'); die if $page =~ m[/]; # Disallow pages from other folders print "Content-Type: text/html\n\n"; print read_file "$page.html";

You disallow anything that has a slash in it, and ".html" is used in the read_file call, so only .html-files from the current directory can be read, right?

Wrong. Just poisoning the data with a null byte is enough to evade the .html restriction. URI-encoded, a null byte is %00.!

The underlying function is a C function. It thinks the string ends where the null byte is. So it opens page.cgi and ignores the "\0blah!.html" part. But wasn't File::Slurp a pure Perl module? Yes, it is. But it uses sysopen internally! Don't let the "sys" part fool you: open uses the same internal C function.

Instead reading through every module and Perl's source to find out what it uses, just remove all null bytes unless you have a good reason to keep them around. While you're at it, remove other control characters as well.

$string =~ tr/\x00-\x09\x0b\x0c\x0e-\x1f//d;

I skipped 0x0a and 0x0d because they are LF (line feed) and CR (carriage return), used for line endings. Depending on the application you write, you may need to exclude more characters, like vertical and horizontal tabs and form feeds.

Taint mode

A good way to make sure you test each string before using it externally is to use Perl's taint mode. It is invoked with -T. The previous example would only need one a small change.

#!/usr/bin/perl -wT ... my ($page) = $cgi->param('page') =~ /^(\w+)\z/ or die; print "Content-Type: text/html\n\n"; print read_file "$page.html";

Note that you should NEVER blindly use . or [^...] in your untaint regex. Whitelisting is much safer than blacklisting, and should have preference. For example ^(.+)\z and ^([^/.]+)\z still allow the dangerous null byte. I use \z instead of $, because $ allows \n (newline) just before the end. Know your tools, so learn to use regexes properly!


Please, add your own generic security related advice below. Preferrably with examples of how easy it is to get wrong. There is much more than I have just mentioned. If you know revelant PM nodes or external URLs, link to them. Let's have all the important information in one place.

But realise that knowing what you're doing, and thus reading documentation, is much better than reading only about the risks involved.

Juerd # { site => '', plp_site => '', do_not_use => 'spamtrap' }

Considered by demerphq: "Section titles are too big"
Unconsidered by davido: No consensus in vote: (keep/edit/delete) = (26/35/0)
Considered by kutsu: "Edit: Move to tutorials" Vote: 4/10/0
Unconsidered by davido: Juerd knows where to post tutorials. He chose to post this as a Meditation. Let's respect the author's decision. Juerd's reaction: the original idea was to consider it for a move, or to post a new node, after having received lots of additional sections. However, I expected much more response than I got. Making this node a tutorial in its current state may give some people the impression that all important security issues are discussed, which is far from true.