Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re^4: Help update the Phalanx 100

by stvn (Monsignor)
on Dec 23, 2004 at 13:49 UTC ( #417083=note: print w/replies, xml ) Need Help??

in reply to Re^3: Help update the Phalanx 100
in thread Help update the Phalanx 100

# Exclude downloads from agents matching this regex, because they seem + to be # related to mirroring or crawling rather than genuine downloads: my $rx_agent_ignore = qr/     \. google \.            |     \. yahoo  \.            |     \b LWP::Simple \b       |     \b MS\ Search \b        |     \b Webmin \b            |     \b Wget \b              |     \b teoma \b /x;

Markus, I may be wrong, but I think that uses LWP::Simple sometimes to download modules with, so excluding this would not be a good idea even though there is a good chance it could also be a spider.


Replies are listed 'Best First'.
Re^5: Help update the Phalanx 100
by MarkusLaker (Beadle) on Dec 23, 2004 at 22:21 UTC
    Thanks, stvn! I've updated the code and results accordingly.


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://417083]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (1)
As of 2023-06-07 19:47 GMT
Find Nodes?
    Voting Booth?
    How often do you go to conferences?

    Results (29 votes). Check out past polls.