Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^2: Removing malicious HTML entities (now with more questions!)

by techcode (Hermit)
on Aug 16, 2008 at 17:38 UTC ( [id://704710]=note: print w/replies, xml ) Need Help??


in reply to Re: Removing malicious HTML entities (now with more questions!)
in thread Removing malicious HTML entities (now with more questions!)

This is how I do it - it's part of my regular CGI::Application framework. But you can also get Vars() from CGI.pm or CGI::Simple. Other stuff is to skip some or all fields (say when creating a CMS and you don't want it to mess up any HTML code inside).
sub form { my $self = shift; my %params = @_; my $skip = array_to_hash($params{'skip_fields'}); # Array/ArrayRef my $q = $self->query(); my %vars = $q->Vars(); unless($params{dont_encode_fields}){ use HTML::Entities; foreach(keys %vars){ next if $skip->{$_}; # Don't encode if it's in skip list $vars{$_} = HTML::Entities::encode_entities($vars{$_}, '<> +&"'); } } return \%vars; }
PS. Latter I found out about grep trick to check if variable is in the array - should change this ...

Have you tried freelancing? Check out Scriptlance - I work there. For more info about Scriptlance and freelancing in general check out my home node.

Replies are listed 'Best First'.
Re^3: Removing malicious HTML entities (now with more questions!)
by Jenda (Abbot) on Aug 16, 2008 at 21:16 UTC

    I don't think it's a good idea to escape the values upon reading them. What if you are gonna need them raw? What if you're gonna need them URL escaped or escaped for inclusion in a JavaScript string literal or or or or.

    Besides not all data will come into your script from the form/query so you'll have to either escape everything, no matter where it comes from or keep track of what is and what is not escaped.

    Escape before you output, not when you input. Because only at the output do you really know how are you going to know how do you need to escape.

      I thought that too at first, but there are so many different ways to output things - read from DB, directly from the form (in case of form error) ... and so many different modules are involved (TemplateToolkit, DBD::MySQL, Data::FromValidator, HTML::FillInForm ...) that I couldn't find an easy bulletproof way to encode everything (automatically) on output.

      And either way - I need it 90% of the time in escaped (secure) format for printing out as part of web pages, forms and similar. I store them that way in a DB, and just print them out as-is. Actually, I would say ~ 100% - as I either need that escaped or not in some special cases - which I mentioned, like WYSIWYG editor as part of CMS. But it continues to be in same format, and very rarely do I need to undo something escaped.

      So this is fire and forget approach. /ex-Yugoslavia languages: Sipas i ne mislis !/

      Performance wise it's also better - as with anything else that you can pre-calculate, instead of escaping it over and over ... You can also think about it as tainted mode - everything "is protected" and you need to untaint anything you might need - no way to forget to escape something. Which is quite easy (to forget) in web world where you add new and change old fields like socks.


      Have you tried freelancing? Check out Scriptlance - I work there. For more info about Scriptlance and freelancing in general check out my home node.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://704710]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2024-03-29 01:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found