Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Script dieing after arbitrary time

by Baz (Friar)
on May 18, 2005 at 19:55 UTC ( [id://458415]=perlquestion: print w/replies, xml ) Need Help??

Baz has asked for the wisdom of the Perl Monks concerning the following question:

Hi
Programming in perl is a hobby of mine and I'm hoping you guys can help me improve a script I've written. The script works something like a search engine bot in that it searches a set of web addresses (hosted on one server and presented in a particular format) for data concerning a particular word. The format of my code is as follows -
#!/usr/bin/perl -w use strict; $| = 1; use CGI::Carp "fatalsToBrowser"; use CGI ":all"; use warnings FATAL => 'all'; my @names = ( "name1", "name2", "name3", ); Start: foreach my $name (@names) { if(!name already_in_database()) { $res = $ua->get( $url1); if(expected_data_not_found_on_a_continous_bases()) { goto Start; } ... add_data_found_to_database_for_this_name(); } }
Over time I've improved the performance of this script. For example, when I get a webpage, I check it to ensure that the page is presenting the data in the correct format. This is necessary because from time to time the server reports that it is busy and the webpage will therefore contain the message "busy, call back later" rather than the expected data. To handle this, I poll this page a number of times, and if I keep getting this busy message, I jump to Start, and start processing this name again from scratch. The busy message is not related to the fact that the server is busy, because when I start the name again I no longer get this message.

I also process each name twice and store the results in separate mysql tables. I can then compare each table to ensure that they are exactly the same. At this point, these tables are matching with an accuracy of 99.9%. Previously, I was getting different results each time I processed the same name, so I'm much happier with my current performance.

To run my script I issue - nohup perl script.cgi, My primary problem now relates to the fact that this script is dieing after an arbitrary length of time and I've now idea why. This is very frustrating because a name might take 10 seconds to process or 10 hours (for the bandwidth available). Plus I have to process each name twice. Therefore I might execute my script and return the following day to find it died 2 hours after I executed it. Or perhaps, it might have run for 8 hours, dieing just before it was finished processing a name with a really long processing time.

I've checked my error_log and I can't find any errors relating to this script. A suitable fix might be to jump to Start once the condition causing the script to die occurs, but I need to find why my script is dieing in the first place.

Any suggestions.

Thanks

Barry.

Replies are listed 'Best First'.
Re: Script dieing after arbitrary time
by scmason (Monk) on May 18, 2005 at 20:17 UTC
    I suspect that this is not enough code to figure out 'why' your sript is dying. I suggest that you try a massive amount logging (constantly flushing your buffer, or use a non buffering logging mechanism) to find out where.

    There seems to be an error in you sample code at the point below:

    foreach my $name (@names) { if(!name already_in_database()) {

    Shouldn't name -> $name ? Additionally, shouldn't it read more like:

    if(! already_in_database($name))

    Good luck with that!

Re: Script dieing after arbitrary time
by bart (Canon) on May 18, 2005 at 21:38 UTC
      Thanks for your help

      Yeah, the above is in pseudo code, sorry for not making that clearer.

      I write to a log, but the amount I'm writing is very limited - about 20 characters per name. Just enough to give me an idea as to how far through the processing I get before it dies. nohup.out receives this output.

      My script is long and awkward, so I rather not post it in any detail.

      I had a look at merlyn's column; I actually came across that code before and used to indicate the progess of a file upload. I also think the problem is some kind of timeout. I wonder could it be a stack overflow? I don't think so. Basically I'm using a hash and incrementing it with each occurrence of this name, so I'm not storing much on the stack. What's taking the script so long to process a name is the fact that it sometimes has to GET thousands of pages per name.

      I'm thinking a fix might be to write a script which issues a ps -eaf. If the script is still running it does nothing, else it executes the script again. I could execute this second script ever hour using a cronjob.

      Not very elegant, but certainly better than what I have right now.
Re: Script dieing after arbitrary time
by TheStudent (Scribe) on May 18, 2005 at 20:46 UTC
    This appears to be pseudocode rather than a working example of your problem. While you display the strict pragma you use many variables without my.
    Maybe taking a look at: Before you Post... and maybe How (Not) To Ask A Question

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://458415]
Approved by BrowserUk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-04-25 13:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found