Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Another way to get around automated bots

by andyf (Pilgrim)
on May 17, 2004 at 09:01 UTC ( [id://353897]=note: print w/replies, xml ) Need Help??


in reply to Another way to get around automated bots

I think that's jolly inventive, even if it's not entirely practical. Of course plain ascii art is a similar tactic.

Interestingly I looked at the complementary problem last year for a rabble of grubby greyhat dotcommers in the next office to me - you guessed it, OCR for noisy .gifs (they actually did perfectly legitimate deeplinking searches ).

I used Image::Magik to read, normalise, greyscale, blur and threshold the image, then take the highest weighted sum of the AND with a test image, read nasty brute force OCR.

Eventully they replaced my code with a far faster C++ implementation that finds minimum distances between FFTs of the images, which quite frankly laughs at Perl (speedwise).

However they still get plenty of problems last time I heard. That is to say, done properly, obfuscated images can be computationally VERY hard to OCR, but it can be done.

Regardless of methodolgy there is a deeper principle at play here, which connects with what Merlyn has to say... eventually you are going to make life so difficult for your end user that any perceptual impairment they have will make reading almost impossible. My (dyslexic) Sister has a damn hard time reading those obfuscated .gifs

My hypothesis then, if you are prepared to throw enough cycles at the problem, with a good enough algorithm, the machine will always be able to filter the info from a noisy image _better_ than a human can. Hence the general method is flawed if its sole objective is to defeat bots.

A better method is to rely on questions from current events news. Make it multiple choice, and make it so that 3 wrong answers out of 5 blocks the IP for an hour.

Even something like

Which dictator has no moustache?
1 Adolf Hitler
2) Augustus Pinochet
3) Saddam Hussain
4) Josef Stalin
5) George W Bush
6) George Palpadopoulos
7) Francois "Papa Doc" Duvalier


would fool pretty much any AI :) Andy
  • Comment on Re: Another way to get around automated bots

Replies are listed 'Best First'.
Re^2: Another way to get around automated bots
by adrianh (Chancellor) on May 17, 2004 at 09:31 UTC
    A better method is to rely on questions from current events news. Make it multiple choice, and make it so that 3 wrong answers out of 5 blocks the IP for an hour.

    In these days of proxies using an IP blocking approach is pretty much a dead end. Blocking IPs will mean that you'll kill of groups of people using proxies, and they're so easy to fake only the technically dull bad people will be affected.

    Without a blocking mechanism it then just comes down to a question of odds.

    I also think you'll be surprised at the high false-negative you'll get with real humans getting the questions wrong :-)

      Blocking IPs [...]:, and they're so easy to fake only the technically dull bad people will be affected.

      Wow. It is easy for you to fake an IP and have the results sent back to you? You'll have to explain that before I believe you.

      If you are using IP for security, then the only risk from faking IPs is that someone can send you data with a forged IP in hopes of getting you to act on it. Simply requiring a minimal dialogue that includes repeating hard-to-predict data is enough to make such extremely unlikely.

      An attacker having control over a block of IP adresses is a separate issue.

      - tye        

        Wow. It is easy for you to fake an IP and have the results sent back to you? You'll have to explain that before I believe you.

        Lack of clarity on my part. I meant that faking a proxy is easy.

        You pretend to be a legitimate caching proxy and fake the Via and X-Forwarded-For headers. Mix in a bot browsing the site with a few legitimate accounts and it becomes almost impossible to tell the difference between good and evil proxies (unless you start hammering the site with thousands of registrations.)

        So you're either faced with blocking proxy IPs, which is bad for legitimate proxy users, or blocking the IPs delivered by the fake proxy headers which will have no effect.

        If you are using IP for security, then the only risk from faking IPs is that someone can send you data with a forged IP in hopes of getting you to act on it.

        Yup.

        A denial of service attack is an especially annoying form of this if one of your possible acts is automated IP blocking. EvilPerson sends bad requests using the faked IP addresses of legitimate users. Legitimate users get banned.

        Adrian & Tye,
        you chaps are quite right, I had forgotten about the whole unreliability thing with IPs for a moment there, it creates a lot of issues. And the danger of blocking proxies, very tricky. Hmmm. What I was attempting to address is the possibility to just brute the form by selecting all the options sequentialy. Or one could just exhaust the list of questions and pay someone to handball the results </blackhat> Hmm, OK. Let's say we immediately change to another question, but we also present the choices in a random order each time, that's an improvement.

        @ Nkuvu, Sorry, it was a flippant example, in reality we would choose something far simpler. Besides it was a trick question, there is one there that you _KNOW_ doesn't have a moustache, but he's not a real dictator. Also all the others were democratically elected in a valid vote at least once in their political careers and _then_ went crazy :)

        I have heard other ideas too, such as getting the client to solve a costly puzzle (in code) so that a bot wouldn't be able to get up much speed. Unfortunately this makes the presumption that the client will let the server instruct it to execute arbitary code, which is obviously bad.

        I think what I am trying to say is, as a general principle you need to find a puzzle that humans can easily solve but a machine cannot. In the end, if you make too many hoops for users to jump through they will just go to another site as Nkuvu says.
Re^2: Another way to get around automated bots
by Nkuvu (Priest) on May 17, 2004 at 17:20 UTC
    If you do implement that multiple question thing, let me know so I can avoid the website, OK? From your list of seven dictators, there are three names I don't recognize, and four names that I wouldn't be able to associate a face with...

    Update: Note that this was a flippant response to your flippant example. :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://353897]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2024-04-24 18:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found