Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Hazardous characters filter

by Anonymous Monk
on Mar 24, 2014 at 18:23 UTC ( [id://1079575]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!
I am trying to filter some bad characters and allowing some with some specific characters like:
... my $test = "this is not OK < but this is ok <url>"; my $res = rep( $test ); print "\n\n$res\n\n"; sub rep { if ( $_[0] ) { $_[0] =~ s/<|>//g unless $_[0] =~ /(\<\burl\b\>)/i; return $_[0]; }else { return ''; } } ...
Instead, its replacing all "<" and ">". Thanks for looking!

Replies are listed 'Best First'.
Re: Hazardous characters filter
by Kenosis (Priest) on Mar 24, 2014 at 19:33 UTC

    Perhaps the following, which uses a negative lookhead and a negative lookbehind, will be helpful:

    use strict; use warnings; my $test = "this is not OK < but this is ok <url>"; print rep($test); sub rep { $_[0] or return $_[0]; $_[0] =~ s/<(?!url)|(?<!url)>//g; return $_[0]; }

    Output:

    this is not OK but this is ok <url>

    Your sub returns '' if $_[0] evaluates to false. This means that 0 and !defined $_[0] would convert to '', and perhaps that's your intent. In these cases, the above just returns what was sent (if anything), so you may need to adjust it.

      Lookahead/lookbehind is a good suggestion. The reason I didn't suggest it is that I expected the OP to respond that the < > should surround an actual url, not the literal string "url".
Re: Hazardous characters filter
by Laurent_R (Canon) on Mar 24, 2014 at 18:35 UTC
    It does not seem to do any replacement for me:
    DB<1> $_ = "this is not OK < but this is ok <url>"; DB<2> s/<|>//g unless /(\<\burl\b\>)/i; DB<3> p $_ this is not OK < but this is ok <url>
Re: Hazardous characters filter
by no_slogan (Deacon) on Mar 24, 2014 at 18:41 UTC

    How about:

    s/(<url>)|<|>/$1||''/eg

    Also, note that by modifying $_[0], you're also changing $test. And may I ask in what context < and > are "hazardous"?

      This
      s/(<url>)|<|>/$1||''/eg
      and this $_[0] =~ s/<|>//g
      will replace the "<" and ">", but what I am trying to do is to replace "<" and ">" only if this pattern "<url>" is not found on the string. At the end the string should be:

      From this: "This is not ok < but this is ok <url>".
      To: "This is not ok but this is ok <url>".

      Thanks!

        what I am trying to do is to replace "<" and ">" only if this pattern "<url>" is not found on the string. At the end the string should be:

        From this: "This is not ok < but this is ok <url>".

        To: "This is not ok but this is ok <url>".

        There is a contradiction between your description of what you want and the example you provide: in the example, the "<url>" pattern is found and, according to your stated rules, no substitution whatsoever should occur. It seems that your example probably makes clearer what you want, but you are failing to describe your needs correctly in plain English words. If you can't explain it in literary language, then you probably don't really understand what you are trying to do. And this might be the key to your problem.

Re: Hazardous characters filter
by MidLifeXis (Monsignor) on Mar 25, 2014 at 17:36 UTC

    You may also want to consider if you want to remove bad characters, or allow good characters. There is a subtle difference. The first requires that you keep a list of all bad characters, and in the case of missing one will allow a bad character through. The second requires that you keep a list of good characters, and missing one will not allow a good input. Which failure mode is worse will help you make your decision.

    --MidLifeXis

Re: Hazardous characters filter
by jellisii2 (Hermit) on Mar 25, 2014 at 11:33 UTC
    Beware writing your own filters with regex, particularly if you're feeding it to something that balks at input. There are many filters that are prebuilt and well tested to do such things. HTML::Entities is one.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1079575]
Approved by GotToBTru
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (3)
As of 2024-04-19 18:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found