Re: technical with IPs
by samtregar (Abbot) on Oct 24, 2006 at 18:12 UTC
|
AFAIK, two people with the same three IP octects being two unique people would be 1 in 1 million chances so I assume I can be rest assured it's someone's bought trying to scrape my pages.
This is an invalid assumption. Any two people from the same ISP (AOL, Time Warner, etc.) will be quite likely to have the same first three octets in their IP. I'm not sure why you're trying to determine if two people from the same IP-block are on your site at the same time - this seems unrelated to bots scanning your site.
Instead, I think you should look into more generic rate-limiting techniques. For example, if you're using CGI::Application you can use CGI::Application::Plugin::RateLimit to limit how fast people can access your site.
-sam
| [reply] [Watch: Dir/Any] |
Re: technical with IPs
by Callum (Chaplain) on Oct 24, 2006 at 18:17 UTC
|
If you want to block IPs then block them at the webserver level rather than as the page is served.
Even if you block a specific IP address rather than a range you're potentially blocking "legit sources", though obviously you're more likely to blog legit traffic if you're blocking a range.
Your assumption that two people sharing the first three octets of their IP address are the same (to 1 in a million) is highly flawed -- most people's IP addresses come from their ISP, company, university etc, and many will be coming through a proxy server -- blocking even a single IP is potentialy going to hit "innocent" users.
minor edit on method | [reply] [Watch: Dir/Any] |
Re: technical with IPs
by blue_cowdawg (Monsignor) on Oct 24, 2006 at 18:24 UTC
|
AFAIK, two people with the same three IP octects being two unique people would be 1 in 1 million chances so I assume I can be rest assured it's someone's bought trying to scrape my pages.
As samtregar points out that's a very bad assumption. I'm
thinking of NAT'ed systems behind a firewall as well. If you
had two people coming from the same university for instance
there's a great potential they will the same IP address
never mind the first three octets.
Peter L. Berghold -- Unix Professional
Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg
| [reply] [Watch: Dir/Any] |
|
This is very common in the business world as well. Each of the last three companies I worked for has had one, two, or three proxy servers that are the source of all the HTTP requests; one company had over five thousand users behind a single HTTP proxy address.
| [reply] [Watch: Dir/Any] |
Re: technical with IPs
by chargrill (Parson) on Oct 24, 2006 at 18:42 UTC
|
For a little more discussion (that also has some apache configuration ideas) see: blocking site scrapers
--chargrill
s**lil*; $*=join'',sort split q**; s;.*;grr; &&s+(.(.)).+$2$1+; $; =
qq-$_-;s,.*,ahc,;$,.=chop for split q,,,reverse;print for($,,$;,$*,$/)
| [reply] [Watch: Dir/Any] [d/l] |
Re: technical with IPs
by ikegami (Patriarch) on Oct 24, 2006 at 18:38 UTC
|
nimdokk provided very good advice.
Additionally or indepently, using a honeypot is simple and effective. In your pages, place a link no user will ever click on (or even see). Anyone who follows that link is a robot. Any further request from that IPsession can be redirected to an error page.
In case the user uses a web accelerator that prefetched the honeypot, the error page should provide the means for the user to validate himself as a person. Captchas provide such a mean.
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
Re: technical with IPs
by nimdokk (Vicar) on Oct 24, 2006 at 18:26 UTC
|
What you might look at would be something that analyzed the traffic coming in, if you get 20 hits in a matter of seconds - perhaps temporarily block that particular IP address for a minute. I have no idea where you'd start with something like that. A bot will (likely) be doing something systematic and (reasonably) predicatble. It might help to try to limit (but not ban) that activity so you don't penalize legitimate usage.
Just my 2 bits :-) | [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] [d/l] |
|
| [reply] [Watch: Dir/Any] |