Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Stopping spam AHHHHH!!!!!

by james28909 (Deacon)
on Jun 10, 2015 at 17:11 UTC ( [id://1129897]=perlquestion: print w/replies, xml ) Need Help??

james28909 has asked for the wisdom of the Perl Monks concerning the following question:

Howdy monks :)
I seek wisdom in a certain subject. At a certain forum, i am a moderator, and have had heated discussion with sysadmin as to how they are going about stopping the spam. Before i go further into my proposal, let me go ahead and post the content that is "spam".
Title: !! back!! 7073347120 FREE VaShiKaRaN SpeCiaLisT IN Allahabad Actual post: 70733471207073347120707334712070733471207073347120 7073347120

Now, my proposal was to strip all alpha characters (a-zA-Z and spaces and special chars). which would only leave numbers.
use strict; use warnings; my @num; while(my $string = (<DATA>)){ $string =~ s/(\D+)//g; push @num, $string; } print $_ for(@num); __DATA__ Title: !! back!! 7073347120 FREE VaShiKaRaN SpeCiaLisT IN Allahabad Post: 707334712070733471207073347120707334712070733471207073347120
Then simply count the numbers and user post count, and
if ($num_count >= 7) && ($post_count <= 5){ stop_spam; }


What are your thoughts on this subject? Any input is appreciated.

Replies are listed 'Best First'.
Re: Stopping spam AHHHHH!!!!!
by atcroft (Abbot) on Jun 10, 2015 at 18:28 UTC

    I agree with the earlier comment that it is an arms-race. Forum spam is in many ways similar to email spam, so the first things I thought of were something along the lines of SpamAssassin, or a Bayesian or other type of filter. Depending on the forum software, there may already be tools to do something similar. However, I would not discount the lower-hanging fruit:

    • Is there a pattern in usernames that are making these posts?
    • Is there a particular block/range of IP addresses they are posted from?
    • Does the forum software have the ability to block certain strings, and if so, are there other string literals that appear in multiple of the posts that can be filtered on? (Understanding that this may only serve as a temporary barrier.)
    • Are there other commonalities in the posts (user-agent, etc.) that can be examined for patterns to the postings?
    • Is there some form of approval system required for posting in certain areas, or for posters of a certain age/post count that could be enabled? (Understanding that major changes in a forum may spark protests among some users unless handled tactfully.)

    Also, have you looked for other resources (generic or specific to your forum software), such as http://www.stopforumspam.com/?

    Hope that helps, and good luck.

    Update: 2015-06-10
    Added note regarding blocking of fixed strings.

    Update: 2015-06-10
    Added link to additional resources.

      1. Is there a pattern in usernames that are making these posts? 2. Is there a particular block/range of IP addresses they are posted f +rom? 3. Does the forum software have the ability to block certain strings, +and if so, are there other string literals that appear in multiple of + the posts that can be filtered on? (Understanding that this may only + serve as a temporary barrier.) 4. Are there other commonalities in the posts (user-agent, etc.) that +can be examined for patterns to the postings? 5. Is there some form of approval system required for posting in certa +in areas, or for posters of a certain age/post count that could be en +abled? (Understanding that major changes in a forum may spark protest +s among some users unless handled tactfully.)
      1. no, it is different each time, but post is always almost the same, only names of cities change and format of telephone number
      2. nope, they use a proxy.
      3. yes sir, the forum does indeed have software, but this is the only type that gets around it, and always has a telephone number (7 or more), and is from new users.
      4. as mentioned earlier, the telephone number is always 7 or more digits, they contain city names, but they change per post.
      5. well new users cant post links, so i dont see why it would be so bad if it wouldnt allow new users to post strings with 7 or more digits in title or post.
        2. Is there a particular block/range of IP addresses they are posted from?
        2. nope, they use a proxy.

        Who is "they"? All users, some users, or just the spammers? A single proxy or several?

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Stopping spam AHHHHH!!!!!
by Old_Gray_Bear (Bishop) on Jun 10, 2015 at 17:38 UTC
    This will work for a few minutes. Then the SPAMmer will change the format on you. The contest between a forum admin and the junk-mailer is a never ending arms race. Been there, done that, and the t-shirt was too small.

    ----
    I Go Back to Sleep, Now.

    OGB

      Thanks for your input Mr.Bear ;)
      Well what it amounts to so far is ....telephone numbers, always 7 or more digits. This would atleast stop any NEW posts if user post count is less that 5 or 10 posts and contain 7 or more digits. This proposal would stop him in his tracks and atleast the spammer/s wont be able to spam new posts with telephone numbers.
      The spammer always starts with a few special characters, then followed by a word usually, then a string of digits that is always 7 or more in length then some more words, usually jumble though. This is the only format spam we get lol, and it ALWAYS contains telephone numbers.
Re: Stopping spam AHHHHH!!!!!
by Anonymous Monk on Jun 10, 2015 at 20:02 UTC
    Try using Bogofilter on each of your messages. After I have built a representative database, it now classifies all new spam messages on my forum as spam (>0.9) and most of the non-spam messages receive a 0.50 score ("unsure").
Re: Stopping spam AHHHHH!!!!!
by flexvault (Monsignor) on Jun 11, 2015 at 14:15 UTC

    james28909,

    Have you tried MailScanner? (It's written in Perl)

    I have fought spam since 1997, and at one time we were receiving more than 1,000,000 spams per day per mail server, 7 days a week. I started using MailScanner and it worked perfectly for about the first 2 years, and then it seemed to max-out handling about 30K emails per day. We wrote a pre-processor with Perl looking for the obvious spam finger-prints and we have held our own ever since.

    Perl's 'index' function saved the day. Using 'index' and known spam lists developed by us, we could keep up with 1MM spams on a single core server. We use multi-core servers now, so we have plenty of capacity.

    You don't mention numbers, so my comments are for sizing consideration. YMMV

    Good Luck!

    Regards...Ed

    "Well done is better than well said." - Benjamin Franklin

Re: Stopping spam AHHHHH!!!!!
by cavac (Parson) on Jun 11, 2015 at 17:48 UTC

    So far, the only way i found to at least slow things down a bit in the spam department is to make sure search engines don't index your site and clearly state so in every page.

    Spammers want their spam to show up in search engines. So suddenly your forums seems much less interesting. Of course there are a lot of downsides and it wont work with automated bots.

    I'm currently also experimenting on ways to detect and eliminate spam bots. But i probably have it way easier, since the site i'm doing this on is neither a) of any signifcant relevance (so it doesn't matter if it's down for a few hours while i play with it not b) do i use a widely spread forum/blog/whatever software (wrote my own webserver). And since the site also isn't commercially relevant in any way (just my personal blog, where i want to add a discussion system), i have no bad conscience when "playing with my food", either.

    If you trying to stop spam on an existing, well known forum with lots of users, it's much harder. You don't want to experiment too much (downtime=bad), it uses a well documented software that is easy to automate from the client side and it's probably a valuable target for spammers (well known, google-indexed, high reputation because many site link to it). For very big forums, even hiring people to manually type in Spam can be profitable, so it get's harder and harder to detected spammers...

    "For me, programming in Perl is like my cooking. The result may not always taste nice, but it's quick, painless and it get's food on the table."

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1129897]
Approved by davies
Front-paged by cavac
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (2)
As of 2024-04-26 07:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found