Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^2: Crawling with Parallel::ForkManager

by listanand (Sexton)
on Aug 07, 2009 at 22:28 UTC ( [id://786948]=note: print w/replies, xml ) Need Help??


in reply to Re: Crawling with Parallel::ForkManager
in thread Crawling with Parallel::ForkManager

Thanks for writing. Well I try to access the webpages right after I stop (terminate, in this case) the program and not much later.

You are right, when I spawn 3 child processes (I have 4 right now), in that case I see much less error messages. But even if I reduce it to 2 parallel connections, I still see error messages !

I can't think of a way out.

  • Comment on Re^2: Crawling with Parallel::ForkManager

Replies are listed 'Best First'.
Re^3: Crawling with Parallel::ForkManager
by fullermd (Priest) on Aug 07, 2009 at 22:44 UTC

    It really just depends on why the server is giving you the cold shoulder. I went with the most obvious; number of simultaneous connections. If that's the case, dropping to 1 (i.e., not parallel at all) would resolve it. But it may do rate-limiting, shoving you away after a given number of responses in a particular time period. It may be server load dependent. It may just be flat-out random.

    Likely, the only way you can find out for sure what's up is by talking to the server admin. The best solution code-wise is to be adaptive; if you start getting errors, slow down, if you get no errors for a while, speed up. But that's a lot of work to get right.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://786948]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (7)
As of 2024-04-23 13:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found