Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: perl regex or module that identifies bots/crawlers

by sgifford (Prior)
on Mar 20, 2007 at 21:15 UTC ( [id://605750]=note: print w/replies, xml ) Need Help??


in reply to perl regex or module that identifies bots/crawlers

Google and Yahoo should certainly be honoring your robots.txt file. You might want to take a closer look, to see what IP address these requests are coming from and what URLs they are fetching; perhaps there is another path to your cgi-bin directory that isn't being protected by your robots file, or maybe there is an error that's preventing your robots file from being processed correctly.

Replies are listed 'Best First'.
Re^2: perl regex or module that identifies bots/crawlers
by Anno (Deacon) on Mar 20, 2007 at 22:22 UTC
    I agree that the real Google and Yahoo, and other big ones, will certainly honor robots.txt. If bots under their names invade a server that may only indicate that these are popular fake names for rogue bots. It would make sense to look like a legit bot instead of, for instance, a browser.

    That said, it is certainly a good idea to check if robots.txt is working as it should.

    Anno

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://605750]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (9)
As of 2024-04-18 11:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found