http://qs321.pair.com?node_id=125496

Cobo has asked for the wisdom of the Perl Monks concerning the following question:

It has been a while since i last made something with Perl so I decided for a brief refresher to make a simple guestbook. I was just wondering if any one had any comments or if any one could see any huge security holes.
#!/use/bin/perl -w use strict; use CGI ':standard'; print header(); print start_html(); print "<html><head><title>Thanks for Signing!</title></head>"; print "<body bgcolor=black text=white>"; my $q=new CGI; my $name =$q->param('name'); my $mail =$q->param('mail'); my $message =$q->param('message'); print "<center><p>Thanks for signing my guestbook, your message +has been posted! $name!</center></p>"; # REMOVE THIS COMMENT TO ACTIVATE CENSOR if ($message =~/badword/i or +$message =~/badword2/i or $message =~/badword3/i) {$message =~ s/badw +ord/bleep/ig; $message =~ s/badword2/bleep/ig; $message =~ s/badword3 +/bleep/ig;}; if ($message =~/</ or $message =~/>/) {$message =~ s/</&lt;/g; $messag +e =~ s/>/&gt;/g;}; if ($name =~/</ or $name =~/>/) {$name =~ s/</&lt;/g; $name =~ s/>/&gt +;/g;}; if ($mail =~ /</ or $mail =~/</) {$mail =~ s/</&lt;/g; $mail =~ s/</&g +t;/g;}; if ($message =~ /\(b\)/i or $message =~ /\(i\)/i or $message =~ /\(\/i +\)/ or $message =~ /\(\/b\)/) {$message =~ s/\(b\)/<b>/ig; $message = +~ s/\(i\)/<i>/ig; $message =~ s/\(\/b\)/<\/b>/ig; $message =~ s/\(\/i +\)/<\/i>/ig;}; if ($message =~ /\(red\)/i or $message =~ /\(\/red\)/i) {$message =~ s +/\(red\)/<font color=red>/ig; $message =~ s/\(\/red\)/<\/font>/ig;}; if ($message =~ /\(red\)/i and $message =! /\(\/red\)/i) {$message=$me +ssage."</font>"}; print "Name: $name <br> Email: $mail <br> Message: $message"; open HTML, ">>../gbook.html" or die $!; print HTML "<i>Name:</i> $name <br> <i>E-Mail: </i>$mail<br> <i>Messag +e: </i>$message <p>"; close HTML; print "</BODY></HTML>";
Any comments would be appreciated, thanks.

Replies are listed 'Best First'.
Re: Looking for feed back on a guestbook
by wog (Curate) on Nov 15, 2001 at 07:47 UTC
    Your script has a race condition. That is, two copies of your script can be running at the same time and they both can try to write a new entry to the guestbook, resulting in it being curropted. To fix this problem you can use flock. (Just use Fcntl qw(:flock); ... flock HTML, LOCK_EX; would probably be enough, see the docs for details.)

    A more minor issue with your script is that print start_html() will print out <html> and a <head> section for your document. You probably shouldn't use it, unless you plan to use CGI to generate all your HTML. (Especially considering that CGI will output XHTML in newer versions, making it so browsers would be welcome to reject your apparently well-formed HTML.)

    I notice that you're testing variables for a pattern and then preforming subtitutions for that pattern. It would be clearer, and probably just as fast, to just try the subtitution -- if the pattern's not found, nothing will happen. Personally, I would write your repetitive substitutions of <, etc. for the appropriate entities with a foreach loop:

    foreach ($name,$mail,$message) { s/</&lt;/g; s/>/&gt;/g; }

    A final potential problem in your script is that you might want to think of a better way of handling situtations where certain input fields are blank.

    In your use of CGI.pm you are mixing the object-oriented ($q->something) and function-based (something()) interfaces. It would be clearer if you stuck with only one.

    update: I notice you use #!/use/bin/perl -w. I suspect you meant #!/usr/bin/perl -w

      Ok, thank you very much. and about the typo, can you tell I'm running windows at the moment? lol ( and if any one is interested (which they aren't) I have a duel boot, I'm running windows at the moment because i can't get any multi-messengers to work and told somebody I'd talk to them on msn)
Re: Looking for feed back on a guestbook
by jarich (Curate) on Nov 15, 2001 at 09:42 UTC
    Most of what I'd say has been commented on already. However I just wanted to remind you of the escapeHTML function. It's quite good at what it does so you could replace some of your code with:
    foreach my $in ($message, $name, $mail) { $in = $q->escapeHTML($in); # and of course you can do your subs here to: $in =~ s/\(b\)/<b>/ig; # ... }
    The escapeHTML function is of course provided by CGI.pm. It doesn't come with the :standard set use CGI qw/:standard/; so if you want to use it that way you have to include it use CGI qw/:standard escapeHTML/;. This of course is not an issue if you're using CGI.pm with its object interface (as I have assumed above).

    Remember that that $in in the foreach loop is an alias to the elements in the list we've created. So for the first iteration of the loop, $in is the same as $message and everything you do to it is done to $message.

    You also might be interested in looking at Damian Conway's Regexp::Common module as it has a very elaborate smut filter that would save you having to write one of your own.

Re: Looking for feed back on a guestbook
by George_Sherston (Vicar) on Nov 15, 2001 at 12:16 UTC
    A few more pence in the pound:

    (1) I'd use a module for stripping those tags, especially since you want to allow some and disallow others. I've not looked at this before, but there's bound to be one somewhere... searches CPAN for HTML stuff... well, just from glancing at the docs, looks like HTM::TagFilter would do just what you want. Looks as though it wd let you be pretty flexible abt what html you allowed your users, in an easy to maintain kind of way.

    (2) Wog alluded to this one in a different context: you cd do your censor in one regex: $message =~ s/badword2|badword3|badword/bleep/ig;Although n.b. you have to be a little careful with the order of your search words, if some of them "contain" others. You notice I have
    badword2|badword3|badword
    instead of
    badword|badword2|badword3
    because the latter wd turn "badword2" into "bleep2".

    (3) About printing out, I agree with the monks who say use CGI or die;, and also those who say that if you do
    use CGI qw/:standard/;
    You don't need then to do
    my $q = new CGI; # and all the $q-> method calling
    My preference in a short script (or in any script, because I have only a limited understanding of OO programming) is the former. Then you could do all your printing right at the end, after munging around your $message etc variables, with these lines which (and I agree beauty is in the eye of the beholder) I think are elegant and easy to read:
    print header, start_html, p( {-align => center}, "Thanks for signing my guestbook, your message has been posted +! $name!", ), "Name: $name", br, "Email: $mail", br, "Message: $message", # ok, line above is a *bit* CGI-obsessive end_html;
    Hope that's some use.

    Please do post your revised code when you're finished with it, as it makes the thread a more complete learning resource for future generations.

    § George Sherston
Re: Looking for feed back on a guestbook
by drinkd (Pilgrim) on Nov 15, 2001 at 18:49 UTC
    use Taint mode.

    I realize you run regexes on everything currently, so it won't hurt you, and you never know when somebody will come in and edit your code sometime and take out one of the censoring checks without realizing the problem.

    drinkd

Censoring...
by joealba (Hermit) on Nov 15, 2001 at 18:35 UTC
    Another great way to censor -- let TheDamian decide which words are bad! :)

    Regexp::Common

    use Regexp::Common qw(RE_profanity); $message =~ s/$RE{profanity}/bleep/msg;
Re: Looking for feed back on a guestbook
by orkysoft (Friar) on Nov 15, 2001 at 17:58 UTC

    I agree with the other wise comments in this thread, but have one more point to add:

    You can let your users use 'real' HTML tags (only those you allow, of course) instead of having them use other brackets or parentheses in an HTML-like language. Just update your regexps accordingly, it's really quite simple. Make sure you do filter out all non-allowed HTML tags, of course.

    (When I saw the 'headline' on a Slashdot nodelet box, I just knew it was one of your threads :-) .)

Re: Looking for feed back on a guestbook
by Cobo (Scribe) on Nov 15, 2001 at 20:42 UTC
    It was asked that I post my alterations, Well I haven't tested it yet but heres my second draft.
    #!/usr/bin/perl -w use strict; use CGI ':standard'; use Regexp::Common qw(RE_profanity); print header(); print start_html(); print "<html><head><title>Thanks for Signing!</title></head>"; print "<body bgcolor=black text=white>"; my $q=new CGI; my $name =$q->param('name'); my $mail =$q->param('mail'); my $message =$q->param('message'); print "<center><p>Thanks for signing my guestbook, your message +has been posted! $name!</center></p>"; # REMOVE THIS COMMENT TO ACTIVATE CENSOR $message =~ s/$RE{profanity}/ +bleep/msg; foreach ($name,$mail,$message) { s/</&lt;/g; s/>/&gt;/g; }; foreach ($message,$mail,$name){ s/\(b\)/<b>/ig; s/\(i\)/<i>/ig; s/\(\/b\)/<\/b>/ig; s/\(\/i\)/<\/i>/ig; }; foreach ($message,$mail,$name) { s/\(red\)/<font color=red>/ig; s/\(\/red\)/<\/font>/ig; }; if ($message =~ /\(red\)/i and $message =! /\(\/red\)/i) {$message=$me +ssage."</font>"}; print "Name: $name <br> Email: $mail <br> Message: $message"; open HTML, ">>../gbook.html" or die $!; print HTML "<i>Name:</i> $name <br> <i>E-Mail: </i>$mail<br> <i>Messag +e: </i>$message <p>"; close HTML; print "</BODY></HTML>";
    I've used a couple of the suggestions so far, when I get home I'll mess around with it more. And thank you all for the comments, I found them very helpful.
      : #!/usr/bin/perl -w : use strict; : use CGI ':standard';
      For some reason, center is not included in :standard. If we lead an item with *, we can use start_ and end_ (*html is included in :standard.)
      use CGI qw/:standard center *font *i *b/; : use Regexp::Common qw(RE_profanity);
      No need to load this unless we're using it. (Moved down page.)
      : print header(); : print start_html(); : : print "<html><head><title>Thanks for Signing!</title></head>"; : print "<body bgcolor=black text=white>";
      This can be combined into:
      print header, start_html( -bgcolor => 'black', -text => 'white', -title => 'Thanks for Signing!'); : my $q=new CGI; : my $name =$q->param('name'); : my $mail =$q->param('mail'); : my $message =$q->param('message');
      This uses CGI.pm as an object. We called CGI in the function oiented style. Most everyone agrees - don't mix the two styles.
      my $name = param('name'); my $mail = param('mail'); my $message = param('message'); : print "<center><p>Thanks for signing my guestbook, your", : "message has been posted! $name!</center></p>";
      With center defined:
      print center( p( 'Thanks for signing my guestbook, your message ', "has been posted! $name!" )); : # REMOVE THIS COMMENT TO ACTIVATE CENSOR # use Regexp::Common qw(RE_profanity); : # $message =~ s/$RE{profanity}/bleep/msg; : : foreach ($name,$mail,$message) { : s/</&lt;/g; : s/>/&gt;/g; : }; : : foreach ($message,$mail,$name){ : s/\(b\)/<b>/ig; : s/\(i\)/<i>/ig; : s/\(\/b\)/<\/b>/ig; : s/\(\/i\)/<\/i>/ig; : }; : : foreach ($message,$mail,$name) : { : s/\(red\)/<font color=red>/ig; : s/\(\/red\)/<\/font>/ig; : };
      These foreach blocks can be combined. I used function calls for the HTML and variables for (red) and (/red) to improve readability (TIMTOWTDI).
      my ($red, $sl_red) = qw|\(red\) \(/red\)|; foreach ($name, $mail, $message) { s/</&lt;/g; s/>/&gt;/g; s|\(b\)|start_b|ieg; s|\(i\)|start_i|ieg; s|\(/b\)|end_b|ieg; s|\(/i\)|end_i|ieg; s/$red/start_font({-color => 'red'})/iego; s/$sl_red/end_font/iego; } : if ($message =~ /\(red\)/i and $message =! /\(\/red\)/i) { : $message=$message."</font>" : };
      This is useless since we have already replaced all occurrences of (red) and (/red). Perhaps we could count successful matches or place this if block before the foreach block.: print    "Name: $name <br> Email: $mail <br> Message: $message";Let's move this down and combine it with the ending.
      : open HTML, ">>../gbook.html" or die $!; : print HTML "<i>Name:</i> $name <br> <i>E-Mail: </i>$mail<br> ", : "<i>Message: </i>$message <p>";
      Or perhaps:
      print HTML i('Name: '), $name, br, i('E-Mail: '), $mail, br, i('Message: '), p($message); : close HTML; : print "</BODY></HTML>";
      Let's add the print from above and use it with CGI.pm.
      print "Name: $name", br, "Email: $mail", br, "Message: $message", end_html; __END__



      HTH,
      Charles K. Clarkson
      I find s!\(/b\)!</b>!ig marginally easier to read than s/\(\/b\)/<\/b>/ig;. If you need to escape the / within a regexp, it's often best to use a different delimiter.
Re: Looking for feed back on a guestbook
by orkysoft (Friar) on Nov 16, 2001 at 02:45 UTC

    I still have this lying around, a subroutine to convert 'AAAtml' (in use at the boards at The Alien Adoption Agency (which is powered by Perl (though the db seems pretty flaky))). It's not 100% bug-free (it hiccups when you incorrectly write links), but otherwise, it's pretty good.

    #!/usr/bin/perl -wT use strict; use CGI; # Convert AAAtml to safe and valid HTML sub atml { sub processtext { my $text = shift; $text = CGI::escapeHTML($text); $text =~ s/\n/<br>/g; return $text; } my $atml = shift; my $html = ""; my @colournames = qw( white green red blue orange yellow purple pink gray lightgrey lightblue yellowgreen turquoise steelblue plumb olive brown gold lawngreen tomato black ); my %NESTED = ( b => 0, # Flag i => 0, # Flag u => 0, # Flag colour => 0, # Flag list => 0, # Flag link => 0, # Flag ); # Obviously, you'll need to change this $link_prefix: my $link_prefix = "http://this-is-just-something-i-made-up/countcl +ick?"; my $link_location = ""; my $link_description = ""; while($atml =~ m{ ( \[.*?\] | # A tag `` | `\d\d? | `\#[a-f\d]{6};? | `br | `li | [^[`]+ # or a non-[-tag, non-`-tag string ) # Capture it! }gxi ) { my $this_element = $1; # print "<li>Element: $this_element</li>"; if($this_element =~ m{^\[} ) { +# It's a tag my $tag = ($this_element =~ m{^\[(.*?)\]$})[0]; $tag = lc $tag; if($tag =~ m{^(/?[biu])$}) { # +Starting or closing BIU tag my $biutag = $1; my $close = ($biutag =~ m{^/}) ? 1 : 0; + # Close or open tag? my $tagtype = substr $biutag, -1, 1; if($close == $NESTED{$tagtype}) { # Is the +tag open, and to be closed, or vice versa? $NESTED{$tagtype} = $close == 0 ? 1 : 0; + # Open / close tag $html .= "<$tag>"; } # Else, ignore ne +sting error } elsif($tag eq "/") { # Cl +osing link tag if($NESTED{link}) { $NESTED{link} = 0; my @i = ($link_description =~ /\w/g); if(scalar @i) { $link_description = processtext $link_descript +ion; } else { # No link desc +ription $link_description = processtext $link_location +; } $html .= qq{ <a href="$link_prefix$link_location"> +$link_description</a> }; } } else { # It must be a link lo +cation or text with a [ if($tag =~ m{^(http://|mailto:)}) { + # Valid-looking link $link_location = $tag; $NESTED{link} = 1; $link_description = ""; } else { if($NESTED{link}) { $link_description .= processtext $this_element +; # Link description } else { $this_element = processtext $this_element; + # Text if($NESTED{list} == 1) { # Are +we in a list? if($this_element =~ s!<br>!</li></ul>!) { + # Yep, end it at the end of line $NESTED{list} = 0; # Now we +'re not in a list anymore } } $html .= $this_element; } } } } elsif($this_element =~ m{^ # Anchor at start of $ +this_element ( `` | # `` - restore default colour `\d\d? | # `9 or `99 - set new colour no `\#[a-f\d]{6};? | # `#89abCD - new method to + set colour `br | # REMOVED WORD COLOUR, but kept RG +B `li ) # Capture it! }xi) { my $tag = $1; if($tag eq "`br") { $html .= "<br>"; } elsif($tag eq "`li") { if($NESTED{list} == 1) { # Are we in a list already +? $html .= "</li></ul>"; # End it first } $NESTED{list} = 1; # We're in a list, $html .= "<ul><li>"; # so start it } elsif($tag eq "``") { if($NESTED{colour} == 1) { $html .= "</font>"; } $NESTED{colour} = 0; } elsif($tag =~ /^`(\d\d?)$/) { # Tag is one or two + digits my $colournr = $1; if($NESTED{colour} == 1) { # Already a colour in ef +fect. $html .= "</font>"; # Terminate it. } else { $NESTED{colour} = 1; } if($colournr < scalar @colournames) { # Valid 1 or +2 digit code my $colour = $colournames[$colournr]; $html .= qq{<font color="$colour">}; } elsif(int($colournr / 10) < scalar @colournames) { + # First digit valid? my $colour = $colournames[int($colournr / 10)]; + # Get colour for first digit (valid) my $digit = $colournr % 10; # Print se +cond digit $html .= qq{<font color="$colour">$digit}; } } elsif($tag =~ /^`#([a-f\d]{6});?$/i) { if($NESTED{colour} == 1) { $html .= "</font>"; } else { $NESTED{colour} = 1; } $html .= qq{<font color="#$1">}; } } else { # It's a piece of text +. if($NESTED{link}) { $link_description .= $this_element; # Link +description } else { $this_element = processtext $this_element; # Te +xt if($NESTED{list} == 1) { # Are we in a +list? if($this_element =~ s!<br>!</li></ul>!) { # Yep +, end it at the end of line $NESTED{list} = 0; # Now we're not +in a list anymore } } $html .= $this_element; } } } # Close all open tags. The above code should make sure no tags are + nested, like <b><b>. $html .= "</font>" if $NESTED{colour}; $html .= "</li></ul>" if $NESTED{list}; $html .= "</b>" if $NESTED{b}; $html .= "</i>" if $NESTED{i}; $html .= "</u>" if $NESTED{u}; if($NESTED{link} == 1) { # This code is the same as the link generating code up there. # This is used to handly unterminated links $NESTED{link} = 0; my @i = ($link_description =~ /\w/g); if(scalar @i) { $link_description = processtext $link_description; } else { # No link description $link_description = processtext $link_location; } $html .= qq{ <a href="$link_prefix$link_location">$link_descri +ption</a> }; } return $html; } ############ ### MAIN ### ############ # This is a simple sample CGI program to demonstrate and debug the thi +ng up there. my $q = new CGI; my $atml = $q->param('atml'); print qq{Content-type: text/html <html><head><title>AAAtml parser</title></head><body>}; if($atml) { print "\n\n<table border=1 align=center><tr><th>Your text</th></tr +><tr><td>\n\n"; print atml $atml; print "\n\n</td></tr></table>"; } print qq{<form method=get><h3>Enter AAAtml:</h3><textarea name=atml co +ls=40 rows=8>$atml</textarea><input type=submit></form> <hr><h3>Explanation</h3><pre> [http://location/]clickhere[/] - link (also [mailto:user\@host.tld]ema +il me[/] ) [b][i][u] - they nest correctly, last past the end of line, and co-ope +rate with colours `0 .. `20 - predefined colours as in <a href="http://www.alienaa.com/h +elp/colors.html">the AAA help files</a> `br - force new line, actually superfluous `li - bullet list item, lasts until your newline (not your `br) `#12aaFF - hexcolour, much like you can use in HTML &lt;font color=#1 +2aaFF&gt; - <b>Will not appear in the AAA for some reason</b> `red; - descriptive color, as interpreted by your browser - <b>Can +celled</b> (this feature might get disabled, so don't count on it) [text] - this is possible, but stray [ and ` characters will be ign +ored for now. I'm hungry, so gimme <a href="mailto:mapster\@pop3free.com">feedback</ +a>! (Because it is a GET form rather than a POST form, the maximum size of the stuff you can enter is about 1k or so. This is not a bug.) </pre></body></html>};