Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: Re: Re: Enough is Enough - Taking the fight back to the Internet scammers

by tachyon (Chancellor)
on Oct 28, 2003 at 11:29 UTC ( [id://302665]=note: print w/replies, xml ) Need Help??


in reply to Re: Re: Enough is Enough - Taking the fight back to the Internet scammers
in thread Enough is Enough - Taking the fight back to the Internet scammers

We do Bayesian stats work as part of a different project and have a filter based on that (proprietary I'm afraid) although there is popmail Popfile on sourcforge which is OK.

Bayesian stat analysis is probably one step past Spam Assassin but still has the following inherent problems. These apply to all forms of spam filters. First if the filter is publically available (as it must effectively be to be used) then you can craft spam and test it against the filter(s). Regardless of what they are looking for and how they rate spam messages in the form:

Dear Name RE: Your recent blah blah blah Thanks for your enquiry. Blah blah blah. Please take the time to have +a look at: http://blah.com/cgi-bin/special_offer?name=Name&code=AGERSDGFTGER I wish you all the best in your endeavour. Kind Regards John Smith Director Blah.com Street Address Phone Number Fax Number Mobile Number BLAH Making it happen http://blah.com foo@blah.com The information transmitted may be confidential, is intended only for +the person to which it is addressed, and may not be reviewed, retransmitte +d, disseminated or relied upon by any other persons. If you received this message in error, please contact the sender and destroy any paper or electronic copies of this message. Any views expressed in this email communication are those of the individual sender, except where the sender specifically states otherwise. Blah does not represent, warrant or guarantee that the communication is free of errors, virus o +r interference.

are statistically next to impossible to pick. The problem with the basic mail protocol is that you can forge headers ie there is no way to validate the sending server. Given this you can more of less craft your emails so they will pass any Spam filter.

Messages like this are the new face of spam. Still spam but crafted to look like a standard valid (perhaps corporate) reply. It will be next to impossible to stop mail in this form.

As a result the challenge response/whitelist passthrough is probably the way it will end up in the medium term. Then of course the spammers will implement respond bots and the cycle will continue.

What is needed is a modification to the underlying protocol so that there is an inbuilt challenge response or security key of some form so that the recipient server can query the supposed sending server to see if it was really the source of the message. If you can do that you can work blacklists of spam servers far more effectively.

cheers

tachyon

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

  • Comment on Re: Re: Re: Enough is Enough - Taking the fight back to the Internet scammers
  • Download Code

Replies are listed 'Best First'.
Re: Re: Re: Re: Enough is Enough - Taking the fight back to the Internet scammers
by revdiablo (Prior) on Oct 28, 2003 at 17:51 UTC

    Actually, I find it interesting to note that the Bayesian spam filter I use catches these types of emails All Day Long (tm). It seemed a curious thing to me, wondering how it was picking these out from more legitimate email. I started analyzing the emails and realized the highest spam words were being grabbed from the headers. Sure, headers can easily be modified, but most spammers apparently aren't that sophisticated. They use common tools with standard headers (usually advertising the tool they are using), which are very easy for the filter to catch.

    As the spammers realize this and start using tools that are harder to catch, there will still be things like MTA versions and hostnames added to the emails along the path. Perhaps certain MTAs with bad default options will begin to stand out as likely spam targets. Perhaps certain IP blocks will begin to stand out, also. Who knows? The great thing is I won't have to think about this. The filter will figure it out automatically.

      As you say the headers are extremely valuable (but only at the moment) Because the protocol LETs you forge them eventually this will become the norm, then with a suitably crafted body even Bayes won't cut it anymore.

      When we look at some of the tokens that our Bayes widgets work with and find significant we often go 'huh?' The fact that we don't really understand WHY some of these tokens exist does not matter one wit. They are statistically significant and thus at the end of the day Just Work.

      The problem is that as Bayes gets more popular the spammers will employ people to analyse how the filters are working and it is fairly easy to find chinks in the armour to slip a knife through to put it in Medieval terms. At that stage I suspect we will be up for a new protocol.

      cheers

      tachyon

      s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: Re: Re: Re: Enough is Enough - Taking the fight back to the Internet scammers
by Roy Johnson (Monsignor) on Oct 28, 2003 at 18:05 UTC
    Tachyon said:
    As a result the challenge response/whitelist passthrough is probably the way it will end up in the medium term. Then of course the spammers will implement respond bots and the cycle will continue.

    But the beauty of that is that they can no longer hide their mail address. It has to be valid. Then you can blacklist it. Setting up numerous real respondbots is much more onerous than just formulating fake return addresses.

    The thing that gets me is: what are they thinking? If someone is trying to filter out their offers, how likely is it that that person will decide to become a customer when their efforts are thwarted?

    Earthlink has a rather ingenious system: they set up some fake accounts expressly to attract spam. When those accounts receive it, they analyze it and filter it out of clients' mailboxes. It works very well. In addition to that, there is whitelisting.

      Roy asked: The thing that gets me is: what are they thinking? If someone is trying to filter out their offers, how likely is it that that person will decide to become a customer when their efforts are thwarted?
      1. Most of the filtering is done by clueful sysadmins, they want to get clueless users.
      2. Apparently the necessary success rate to keep genuine advertising spammers in business is just over one in a million.
      3. Most spam nowdays has nothing to do with advertising dubious products, even if it pretends to be. It is a big pyramid scheme where they sell each other lists of addresses (and hopefully the whole thing will implode real soon). In these emails, all they want is for the sucker to view it in an HTML-aware mail client, to pull in a web bug and confirm the address is live. It doesn't matter if it's immediately deleted. In fact I'm increasingly seeing ones where the "click here" links don't even resolve.
Re: Re: Re: Re: Enough is Enough - Taking the fight back to the Internet scammers
by Zero_Flop (Pilgrim) on Oct 28, 2003 at 18:45 UTC
    tachyon

    I think that you are talking about PopFile not PopMail and I am definately a convert. I was recenltly slammed with the last Win bug and PopFile was able to capture every email that came in. Do to a system upgrade I lost my training, but was able to retrain popfile to >95% accruacy in less than a week.

    As a side note, you can also use popfile to organize your email. For example I have folders for family, Spam, newsgroups.

    One other thing to be aware of concerning popfile. It classifies emails based on the email body as well as the headers, so altering the header does not fool it.

    I would recommend it to any one interested in filtering email. It is also one of the non-CGI Perl programs I use to show people that perl is more than a CGI program.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://302665]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2024-04-18 23:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found