Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Exec'd perl running wild on cpu-time

by jeroenes (Priest)
on Nov 13, 2000 at 17:56 UTC ( [id://41297]=perlquestion: print w/replies, xml ) Need Help??

jeroenes has asked for the wisdom of the Perl Monks concerning the following question:

As a humble novice, I would like to ask some advice about a perl-script which I run in background mode. Actually, it is more a How-Do-I-Run-This question than a perl-coding question, methink. Let's first start with a short description of the script.

The script wakes up (should wake up) every 5 minutes, go and retrieve traffic-jam info from a dutch site, log it, and when the time is right, it should report this info in an e-mail to my wife, who uses this info to determine the time of her homily return. As you can imagine, this script really is a part of our married life ;-).

The main loop is as follows:

while (1){ @the_html=init_html(); @data=parse_data(@the_html); if (@data) { write_data(filter_data(@data)); mail_data(filter_data(@data)); } else { write_data("\tNo connection or data, so","\tsleeping until next fe +tch."); } sleep(300); }
and I use IO::Socket to retrieve the web-page. The script runs just fine, but after a couple of days it starts to eat CPU time so I have to kill the process.

Normally I ran it with  exec get_files.pl files &, after which I killed the terminal. But lately I discovered that if I leave the terminal open, the script keeps running without a problem. At the moment it has been running for one week without a problem.

It runs, my wife is happy, and so am I. But, I still don't know why the script goes mad when I close the terminal. Is there a reason? Am I doing something wrong? I'm very curious about this.

If your (our?) great society of monks could come up with an answer, a humble novice would be made very happy.

May the Perl be with you,

Jeroen

PS: If any ofyou happen to live in Holland, and are interested in the script itself, don't hesitate to contact me.

Replies are listed 'Best First'.
Re: Exec'd perl running wild on cpu-time
by Corion (Patriarch) on Nov 13, 2000 at 18:05 UTC

    If you want a script to run periodically, this is always a call for cron, the program that starts tasks under the several variants of Unix. Then you can do away with the while(1){} loop and let your program run just once every 5 minutes. If you are working under Win32, there is either the dreaded Scheduler service under Windows 9x (avoid it, it needs a person logged in), or the at service under NT, which has a different syntax but does what cron does under Unix.

    Much more interesting though is, why your Perl program racks up that much CPU time at all, but I'm at a loss here.

      Actually, I run the script continously. But, before finding the "open terminal" solution, I was thinking about making a cron job to periodically kill and restart the script. This way, I could get rid of the open terminal day and night and weekends...

      Indeed, I consider the CPU-black-holish thing puzzling. Maybe there is (and I hope so) a obvious/ stupid flaw in my way of handling the script.

        Of course, this is drifting away from Perl, but in the spirit of the Right Tool for The Right Job, cron knows all about weekdays and time slots. I would use cron (or at, which can do that stuff under NT) with the following crontab entry to ensure that it checks every 5 minutes during the week, from 7:00h to 20:00h, monday through friday :

        0,5,10,15,20,25,30,35,40,45,50,55 7-20 * * 1-5
        Note: I didn't check that line and also I only worked from the crontab (5) manpage. Usually this means that some tweaking is required afterwards.

Re: Exec'd perl running wild on cpu-time
by Fastolfe (Vicar) on Nov 13, 2000 at 18:33 UTC
    Your main loop offers no clues at all as to the CPU problem. Have you tried using strict and running Perl with warnings (-w) enabled?

    Note that your filter_data function is being called twice: once for write_data and once for mail_data. If this is an expensive function you should consider running it once, storing the results, and pass those results to each of the two functions.

    If your main functions are too long to post, consider just reading through them and check your assumptions and error checking. What happens if it can't get a response? What happens if the response it gets doesn't follow your expectations? Are you looping and using a variable to determine when the loop should end? If so, are you sure this variable is allowing the loop to exit when unexpected things happen?

    In the past, I've occasionally used 'strace' (or 'truss' or 'ptrace', depending on your OS) against a process stuck like that. That usually lets me see what system calls it's trying to do (if any). Sometimes this points me to the right place in my code.

      Quite a few remarks... thank you! 1. No warnings. I still haven't looked into the module programming, so I don't know what to do with the 'import' stuff. Has been moved into my 'todo' list....

      2. Indeed, the function is called twice. Normally, CPU time is no issue, it's only two minutes/day MAX. And I think most time is spend waiting for the HTML content.

      3. If any connection or whatever fails, the script is simply terminated with a die. No problems from that perspective (ie, a terminated script) thusfar.

      4. If a content is not matched, the filter just returns an empty array, and only the 'templates' ar logged/mailed. The filter routine, and the parse routine are quite simple, just some coupled regex's that fill some arrays. If something goes wrong, an array will be empty, that's all. If a connection fails, the script will die().

      5. see next answer

      Jeroen

      I was dreaming of guitarnotes that would irritate an executive kind of guy (FZ)

        Another thing I would recommend is setting yourself up with a debugging log. Have it log the last few (or all) of its requests and the results it gets back from the server. The next time your script misbehaves, take a look at this log to see what it's acting upon. If you want to put 'markers' or 'checkpoints' in various places in and around your loops, so that information is logged as your program reaches various points of execution, this would also let you trace the flow of execution, though if your script is entering an infinite loop (which I suspect it is), this log file will fill up fast.
Re: Exec'd perl running wild on cpu-time
by ChOas (Curate) on Nov 13, 2000 at 18:44 UTC
    Eeehm dunno if this helps, but seeing you use 'kill' kinda
    makes me guess you use a form of *NIX, though this is not a real
    answer, it might help, inserting 'nohup' in front of your
    program will let it ignore the HUP signal (gets sent to your
    program when you kill the terminal...
    e.g.: nohup exec get_files.pl files &

    Bye!!
      Aha! Ambient light fills my spinning head.... I would say this should be THE answer. I killed the script, and restarted it with the nohup. Now I'm heading for the monastery's library....

      Cheers,
      and thanx 2u all!

      Jeroen

      I was dreaming of guitarnotes that would irritate an executive kind of guy (FZ)

        Hmmm... reply to my own reply... spells disaster, doesn't it?

        Well, according to the library, the nohup could also be accomplished by using $SIG{INT} = 'IGNORE';. That's at least a more perlish solution ;-).

        May the Pearl be at your convenience, and thanks for the fishes. Or whatever.

        Jeroen

        Post Posting: I'll keep your _posted_ about the stability of the script. Thanks.

        Post Post Posting: Oh yeah, 'file' is dutch for 'traffic jam'. In case you were wondering about the awkard use of 'file' above.

      Doing something like this seems to have the same effect (it's what i use):
      (exec get_files.pl files &)
      I wish I knew what the difference was.
Re: Exec'd perl running wild on cpu-time
by elwarren (Priest) on Nov 13, 2000 at 21:09 UTC
    If the nohup helped your problem then I would assume that it's some sort of issue with your os handling STDOUT and STDERR after detaching the terminal. If you are happy with the nohup solution (a good solution) then you could leave it, but if you're still interested in tracking down your bug I would redirect the STDOUT and STDERR to a log as Fastolfe mentioned a couple posts above. This may also solve your problem. You could also take a look at the net::daemon or proc::daemon modules too.
      Wow! More useful tips! Yes, I actually already have shifted to the use of 2&>1 >>log.out. This may have solved the problem as well, as I don't recall the day I switched to the redirection. For the curious, the logfile is still empty, but I don't dump any of the data, because the log file clutters up too many bits. I'll certainly look into the deamon-modules.

      It feels like 42...

      Update: I changed the script at several points.

      1. I read the docs on Package and stuff, and I use use strict; now.

      2. The nohup exec command resulted in an immediate exit. I think it should be plain nohup. Anyway, I now use SIG{INT}='IGNORE', and I didn't try the nohup/() variants.

      3. There is more to starting daemons than I thought. I read the Net:: Proc::daemon PODs, and there is a complete sequence needed for detaching from a terminal. Because I don't need sockets for my script, I will use Proc::daemon.

      4. However, I postpone coding this for a while. For now, the script has been running for more than a day now, without problems.

      Thanks to you all!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://41297]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2024-04-25 09:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found