Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: What is the fastest way to download a bunch of web pages?

by lestrrat (Deacon)
on Mar 03, 2005 at 16:08 UTC ( [id://436260]=note: print w/replies, xml ) Need Help??


in reply to What is the fastest way to download a bunch of web pages?

Just to make things more interesting, I'd suggest you take a look at even based approach, for example, via POE (POE::Component::Client::HTTP) or the like.

But I'd suggest that you keep this in the back of your head, and leave it for future, because it requires that you think about I/O, order of things, blah blah blah.

It was pretty hard for me personally to write a web crawler like that.

But anyway, it *is* possible to increase the performance of fetching websites to about 10K ~ 20K urls/hour using such an approach. And this is with a single process.

  • Comment on Re: What is the fastest way to download a bunch of web pages?

Replies are listed 'Best First'.
Re^2: What is the fastest way to download a bunch of web pages?
by tphyahoo (Vicar) on Mar 03, 2005 at 16:27 UTC
    Sounds promising. Any open code to do this?

      If you're looking to control how many child processes, Parallel::ForkManager may be helpful. The example source specifically demonstrates what I think you're trying to accomplish.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://436260]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-04-19 22:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found