Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Best module for web-spider

by bash (Scribe)
on Mar 21, 2008 at 00:42 UTC ( [id://675351]=perlquestion: print w/replies, xml ) Need Help??

bash has asked for the wisdom of the Perl Monks concerning the following question:

Hello all, I am working on web-spider and trying to find best solution for it. I don't want to use fork()'ed process for fast page-fetching, because it takes extra over-head for inter-process communication. I want to do everything inside one process. I have tried LWP::Parallel, but looks like it can not give me really big speed (i still see that there are free CPU and Network resources). Can you give me some good ideas how can i achieve my targets (one process, maximum network performance)?

Replies are listed 'Best First'.
Re: Best module for web-spider
by szbalint (Friar) on Mar 21, 2008 at 01:30 UTC
    You might want to try using WWW::Curl::Multi and POE.
      Very Interesting. Thank you!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://675351]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2024-03-29 06:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found