Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

•Re: Hiding mail addresses in mailto: with JavaScript

by merlyn (Sage)
on Oct 25, 2003 at 12:47 UTC ( [id://302054]=note: print w/replies, xml ) Need Help??


in reply to Hiding mail addresses in mailto: with JavaScript

Researchers just a few months ago demostrated that all you have to do is encode some part of your string as an entity, and that suffices to foil all known spam scrapers. Plus it doesn't break on non-Javascript browsers, as your example does.

Please don't use this javascript solution. You're solving a non-existant problem.

And it'll be a long time before spammers go to the trouble of decoding entities on scraped pages. After all, there are alreadly millions of addresses in "XXX@yyy.ZZZ" form on the web that don't require the CPU to decode, and they're after numbers, not quality or cleverness.

It also suffices to have at least one unusual character in your email address: my email address of <fred&barney@stonehenge.com> has never been spammed, despite appearing in numerous usenet posts and web pages. Yes, <barney@stonehenge.com> has gotten numerous hits from almost the first day the other had appeared, but never the whole thing.

In summary, write your mailto links like this:

<a href="mailto:merlyn&#64;stonehenge.com"> Send mail to <tt>merlyn&#64;stonehenge.com</tt>!</a>
and it not only looks right, it acts right, and yet the spammers don't see it. Don't use Javascript.

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

Replies are listed 'Best First'.
Re: &bull;Re: Hiding mail addresses in mailto: with JavaScript
by tilly (Archbishop) on Oct 25, 2003 at 14:15 UTC
    I wouldn't be so convinced that spammers are inevitably incompetent. Just look at how some of them have been studying tools like SpamAssassin and figuring out how to get around the filter. Given that various popular email list to web gateways read the same research that you did and are using HTML encoding to hide addresses, it is only a question of time before that becomes tempting enough for spammers to add a couple of new regular expressions to their web scrapers and catch either &#64; or @ in email addresses.

    Your fred&barney trick is likely safe for a long, long time. There aren't enough people with & in their email addresses to be worth behaviour modification from spammers. The same won't remain true of HTML encoding @.

      I didn't say anything about spammer's incompetence. I'm talking about the ratio of low-hanging fruit to hidden fruit. As long as there are 10,000 times as many "foo@bar.com" in web pages as there are encoded addresses, spammers have no motivation to change.

      The fact that smart spammers are working around SpamAssassin is actually a testimony to the market penetration of such tools, especially by large mail targets like AOL and Hotmail and Earthlink. So, we're probably seeing them worry about 10% of their addresses being undeliverable, not 1/10000 of their addresses not even appearing in the first place. (I could even make the argument that an address that is hard to scrap is also likely to be trapped in other ways as well, so there's really no point in sending to it.)

      Thus, I will continue to recommend at the moment only some html-entity protection, until someone shows me otherwise, in a case of an actual spamscrape.

      -- Randal L. Schwartz, Perl hacker
      Be sure to read my standard disclaimer if this is a reply.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://302054]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2024-04-20 03:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found