Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Question: Fast way to validate 600K websites

by starbolin (Hermit)
on May 12, 2008 at 17:41 UTC ( [id://686118]=note: print w/replies, xml ) Need Help??


in reply to Question: Fast way to validate 600K websites

It's not the tool; it's the structure. If you have to wait for each site to respond before doing a GET on the next site it's going to take forever. You need a way to issue a block of GETs, forking a child to handle each issue, then process the ones that respond, and issue new GETs as processes are freed up. Perhaps read perlipc. There are some good tools out there to make this kind of thing more robust ( if not less painfull ), see: POE

Update: I'm wrong, the module you want is LWP::Parallel as grinder points out. The module documentation even provides the code you want.


s//----->\t/;$~="JAPH";s//\r<$~~/;{s|~$~-|-~$~|||s |-$~~|$~~-|||s,<$~~,<~$~,,s,~$~>,$~~>,, $|=1,select$,,$,,$,,1e-1;print;redo}
  • Comment on Re: Question: Fast way to validate 600K websites

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://686118]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (2)
As of 2024-04-24 23:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found