Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Parallel Downloads using Parallel::ForkManager or whatever works!!!

by jamesluc (Novice)
on Jan 08, 2002 at 02:37 UTC ( [id://136983]=perlquestion: print w/replies, xml ) Need Help??

jamesluc has asked for the wisdom of the Perl Monks concerning the following question:

Has anyone successfully used the Parallel::ForkManager to download web pages consistently? I am having two primary problems, 1) its not very fast for me (although in testing I am only using 3 to 4 urls so far, I can download many more pages faster without it). Also, the results of my script are inconsistent, sometimes I get all the requested pages, other times the script errors out using NT :( before getting all the pages. I suspect that I am not properly using the “sub wait_all_childs”. I believe that the script quits prematurely even though I have called the “wait” sub. I have inserted it in every place that I can think of and its still inconsistent. I’ve included an excerpt of the code. Please help. The target urls are for testing purposes only.
######################## use Parallel::ForkManager; use LWP::Simple; use LWP::UserAgent ; use HTTP::Status ; use HTTP::Request ; %urls = ( 'drudge'=> 'http://www.drudgereport.com', 'rush' => 'http://www.rushlimbaugh.com/home/today.gu +est.html', 'yahoo' => 'http://www.yahoo.com', 'cds' => 'http://www.cdsllc.com/',); foreach $myURL (sort(values(%urls))) { $count++; print "Count is $count\n"; $document = DOCUMENT_RETRIEVER($myURL); } sub DOCUMENT_RETRIEVER { $myURL=$_[0]; $mit = $myURL; print "Commencing DOCUMENT_RETRIEVER number $iteration for $mit\n"; print "Iteration is $iteration and Count is $count\n"; for ($iteration = $count; $iteration <= $count; $iteration++) { $name = $iteration; print "NAME $name\n" ; my $pm=new Parallel::ForkManager(30); $pm->start and next; print "Starting Child Process $iteration for $mit\n" ; $ua = LWP::UserAgent->new; $ua->agent("$0/0.1 " . $ua->agent); $req = new HTTP::Request 'GET' => "$mit"; $res = $ua->request($req, "$name.html" print "Process $iteration Complete\n" ; $pm->finish; $pm->wait_all_childs; print "Waiting on children\n"; } undef $name; }
  • Comment on Parallel Downloads using Parallel::ForkManager or whatever works!!!
  • Download Code

Replies are listed 'Best First'.
Re: Parallel Downloads using Parallel::ForkManager or whatever works!!!
by merlyn (Sage) on Jan 08, 2002 at 03:10 UTC
    It might be useful for you to say you also just posted this on the perl-cgi-help mailing list and I already sent you an answer there, so we don't create duplicate answers to that forum.

    That really annoys me when people "multi-post" their questions without full disclosure.

    -- Randal L. Schwartz, Perl hacker

      1) This is the first time I've used Perl Monks or beginners@perl.org. 2) I just wanted some help with my script, not a lecture on the evils of multi-posting. By the way I must have overlooked the "full disclosure" rule when I signed up. Sorry about that. 3) I appreciate your providing me with the link to your article at http://www.stonehenge.com/merlyn.LinuxMag/col16.html However, what if one of the other Perl Monks didn't subscribed to perl-cgi-help mailing list? They would not have seen the question before. They may have a different answer. 4) That really annoys me when people think that they have the only answer. :-)
        If you have never used either, then please consider this an accidental introduction to a slightly different culture.

        When you ask a question online, you are asking other people to volunteer their time and energy for your sake. They are willing to do so, else they would not do it, but they tend to think that their time has value, and prefer to have it treated that way. Among other things this makes it rather irritating to find out that someone asked the same exact question in 10 different places without waiting to find out whether the first would have answered it. This means that 9 out of 10 sets of volunteers have just been asked to do useless work when they could have answered someone else's question.

        While it might feel great for you to have all of these experts hastening to provide you with answers, it isn't very nice for people who don't get answers, and it isn't very nice for the experts involved. It isn't very nice for people whose questions may get passed over. And if that becomes common, then it becomes harder and harder to find experts who are willing to volunteer time and energy to produce those answers, which is really not very nice.

        As for people thinking they have the only answer, that had nothing to do with what merlyn said or why you said it. You are actually likely (ironically) to get the best variety of questions if you ask a good question that people have multiple answers for, and people know the other answers that have already been given so they can choose whether or not to find one you don't already have. TIMTOWTDI, but if people think independently, they surprisingly often come up with the same answers. (Yet more evidence that asking multiple groups the same question results in useless duplication of work.)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://136983]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2024-04-25 05:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found