Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^4: Help update the Phalanx 100

by stvn (Monsignor)
on Dec 23, 2004 at 13:49 UTC ( [id://417083]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Help update the Phalanx 100
in thread Help update the Phalanx 100

# Exclude downloads from agents matching this regex, because they seem + to be # related to mirroring or crawling rather than genuine downloads: my $rx_agent_ignore = qr/     \. google \.            |     \. yahoo  \.            |     \b LWP::Simple \b       |     \b MS\ Search \b        |     \b Webmin \b            |     \b Wget \b              |     \b teoma \b /x;

Markus, I may be wrong, but I think that CPAN.pm uses LWP::Simple sometimes to download modules with, so excluding this would not be a good idea even though there is a good chance it could also be a spider.

-stvn

Replies are listed 'Best First'.
Re^5: Help update the Phalanx 100
by MarkusLaker (Beadle) on Dec 23, 2004 at 22:21 UTC
    Thanks, stvn! I've updated the code and results accordingly.

    Markus

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://417083]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (4)
As of 2024-04-25 22:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found