Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Re: Including File in array

by TedPride (Priest)
on Jun 25, 2006 at 23:14 UTC ( [id://557485]=note: print w/replies, xml ) Need Help??

in reply to Including File in array

Thing is, some of the words you're going to want to block if they're exact matches, and others if a word starts with them, and others still of a word contains them anywhere. What you really need is a series of regexes and perhaps a scoring system:
use strict; use warnings; my @ban = ( ['bleep\w*', 1], ['bloop\w*', 1], ['blark\w*', 2], ['blank', 1], ); my $text = join '', <DATA>; print check($text); sub check { my ($score, $word) = 0; $_ = lc($_[0]); s/\W+/ /g; for $word (@ban) { $score += m/\b$word->[0]\b/ * $word->[1]; } return $score; } __DATA__ Bleeping blooper! Blark you! Blank!
You will probably also want to replace any bad words with substitutes (maybe CENSORED) if the score is more than 0 but less than whatever your cut-off is, but I'll leave that part up to you.

My advice, however (as someone who used WWWBoard for years and was constantly trying to devise a system to stop spammers) is to forget trying to solve a problem that's impossible to solve. Instead of blocking posts based on their contents, just add a visual verification system that requires people to type in a code they see in a graphic. So long as the source graphics are your own, and not from some popular system that spammers have already cracked, you should end up reasonably spam-free. Just give people a cookie after the first verification that allows them to post for a few hours without further verifications, and add a system to block bots that try to brute force your verification. This way regular users won't be inconvenienced much, and spam bots won't be able to post at all.

Of course, there may be a few real people who post obscene things, so you'll still want to replace bad words with CENSORED.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://557485]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (6)
As of 2024-04-18 21:01 GMT
Find Nodes?
    Voting Booth?

    No recent polls found