Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

prohibiting certain strings

by keiusui (Monk)
on Dec 28, 2005 at 23:15 UTC ( [id://519664]=perlquestion: print w/replies, xml ) Need Help??

keiusui has asked for the wisdom of the Perl Monks concerning the following question:

I want a Perl program to quit when a user enters the word, "badword". Therefore I have the following conditional:

if($input =~ /badword/i){exit 1;}

This works for most users, but some clever users have bypassed the conditional by submitting strings such as "bad.word", "b.a.d.w.o.r.d", "b/a/d/w/o/r/d" and "b a.d*w/o r-d".

So, my question is:

How can I modify my conditional statement so that if a user enters the string "badword" with ANY non-alphanumeric characters in between each letter, then the program quits?

Thank you so much.

Replies are listed 'Best First'.
Re: prohibiting certain strings
by atcroft (Abbot) on Dec 29, 2005 at 00:48 UTC

    As there is often more than one way to do it, here is what I mentioned in the CB (for your reference):

    12/28 at 18:19:49 <atcroft> keiusui: exit if ($input =~ m/b\W*a\W*d\W* +w\W*o\W*r\W*d/i);, maybe? (note: untested)

    You may also want to look at Regex::Common::profanity as another possibility.

    HTH.

      Thank you so much for all the insight and help. I will be using atcroft's solution: m/b\W*a\W*d\W* +w\W*o\W*r\W*d/i)
Re: prohibiting certain strings
by phaylon (Curate) on Dec 28, 2005 at 23:29 UTC
    If you want to develop a badwords-filter for a commenting-system or such, I'm afraid I made the experience that there is no automatic filter that can stop people from using unwanted language. I found moderation/approvement easying functions much more successful.

    Ordinary morality is for ordinary people. -- Aleister Crowley
Re: prohibiting certain strings
by dimar (Curate) on Dec 29, 2005 at 00:01 UTC

    Not to mention the fact that if you get too clever with your 'badword' filter, you will inevitably wind up filtering legitimate words. This is often more annoying than not having a filter at all.

    Can you guess why *all* of the following lines might not pass a badword test?:

    A: who reads the news? B: if they *publish* it, flip reads it A: flip is so well-read B: yup, pen is too, but he's a bit cocky A: probably cause he drives a FiretrUCK, You!

    "bad words" are everywhere, if you look hard enough. If you look too hard, you will wind up irritating the polite people, and giving the naughty potty mouths one more way to mock you, and your 'clever' filter.

    =oQDlNWYsBHI5JXZ2VGIulGIlJXYgQkUPxEIlhGdgY2bgMXZ5VGIlhGV
Re: prohibiting certain strings
by diotalevi (Canon) on Dec 28, 2005 at 23:18 UTC

    Insert something like (?s:.*) between each character.

    /b(?s:.*)a(?s:.*)d(?s:.*)w(?s:.*)o(?s:.*)r(?s:.*)d/i

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

      Not quite. The sentence "She hid in the closet" would be flagged as a badword.

        I considered that but figured that I'd rather be restrictive than permissive.

        ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re: prohibiting certain strings
by bart (Canon) on Dec 28, 2005 at 23:31 UTC
    Drop the nonword characters, and try again.
    $input = "There's the b/a/d/w/o/r/d."; (my $test = $input) =~ s/\W+//g; if($test =~ /badword/) { print "You sneaky devil!\n"; }

      Of course, that still leaves 'badw0rd' and 'b_a_d_w_o_r_d', but it is a start. I would be more restrictive:

      (my $test = $input) =~ s/[^A-Z]//ig;

      This will still permit 'baaadwooord', unfortunately.

      I'm inclined to agree with [id://phaylon], in that there is no substitute for moderation. :)

      Update: forgot the ^ character. <blush>
Re: prohibiting certain strings
by TedPride (Priest) on Dec 29, 2005 at 03:18 UTC
    The problem is that any script restrictive enough to catch everything will also flag legitimate posts (as mentioned above). Probably the best thing to do is score users on the number of bad words they use, and if the count goes over a certain total number and a certain average number per post, then have them banned automatically. This avoids the problem of people seeing their post has been blocked and then making creative variations (since they won't know you're counting the bad words until the count slays them), and it should be pretty easy to look up people who are high on the bad word ranking but haven't yet been officially banned by you (as opposed to auto-banned) and check for genuine bad words in their posts.

    Needless to say, an exact match for a bad word would be scored high, whereas a match once characters have been removed (or added - symbols are often used in place of letters) would be scored less. One might debate that a swear word creative enough to defeat all algorithms is not as damaging as a regular swear word anyhow.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://519664]
Approved by atcroft
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (2)
As of 2024-04-25 05:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found