Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: cron script best practices

by 5mi11er (Deacon)
on Aug 10, 2005 at 20:29 UTC ( [id://482752]=note: print w/replies, xml ) Need Help??


in reply to cron script best practices

Starting from the thing that popped into my head first, why are you getting so many processes? Do you start with one or a few, and then they start replicating? If so, I'd guess you've got a case of the *'s syndrome. The * * * * * * at the beginning of the crontab file basically tells cron to kick off your script(s) every minute.

If you're not doing that, then you really need to figure out why you get so many processes, because something is pretty obviously broken.

Other potential traps to avoid include

  • Jobs run from cron have a very limited environment compared to most user environments. If the scripts need a better environment, you need to ensure they get that environment.
  • Paths are part of the environment discussion above, either fully specify them, or make sure the scripts compensate appropriately.

That's all I can think of for now, hope this helps your situation

-Scott

Replies are listed 'Best First'.
Re^2: cron script best practices
by jimbus (Friar) on Aug 10, 2005 at 21:35 UTC
    Here is my crontab:
    reports@clarkkent/home/reports(7): crontab -l
    00 06 * * * /home/reports/ftp/SMSC0/loadData.pl
    00 06 * * * /home/reports/ftp/MMSC1/loadData.pl
    20,35,50,05 * * * * /home/reports/ftp/YTSMSC50/loadData.pl
    20,35,50,05 * * * * /home/reports/ftp/FDSMSC/loadData.pl
    00 06 * * * /home/reports/ftp/proptima/ftp.pl
    

    loadData.pl is the script I'm checking on ("ps -ef|grep loadData|wc -l"). The first two should be running once a day at 6am and the second two every fifteen minutes... which is 96 times each per 24 hour period. I'm assuming the issue is with the second two, which are the same but for different boxes.

    These scripts digest a log file that is a series of reports from about 12 nodes, each one has between 50 and 225 key and value pairs, one per line. I loop through the nodes, building a hash of the key/value, then build a huge insert based from them... with upto 225 columns, the insert is built dynamically.

    I have filled /usr a couple times, once recently. I thought things would recover, but I end up with all these processes and mysqld running at 60-70% of cpu.

    I guess the real thing is I'm resource strapped and perl inexperienced and getting a bit overwhelmed by the amount of data being chucked at me and was hoping to find someone who had documented what it took to write mature cron/logging scripts :) With Perl and JDBC for JSP, I find all kinds of simplest case stuff on the web, but not a lot on what I would think would be typical useage patterns.

    Thanks,

    Jimbus

    Never moon a werewolf!
      20,35,50,05 * * * * /home/reports/ftp/YTSMSC50/loadData.pl 20,35,50,05 * * * * /home/reports/ftp/FDSMSC/loadData.pl

      Do these scripts need to be run simaltaneously? You could immediately reduce the number of connections to your DB if one script ran, executed, exited, and then the other was fired off.

      i.e

      20,35,50,05 * * * * /home/reports/ftp/YTSMSC50/loadData.pl; /home/repo +rts/ftp/FDSMSC/loadData.pl

      I've got to ask - do any of your inserts work at all? As mentioned in another response, if your script is working fine from hand but not from cron, it may be an environment issue. I'd modify my cron to something like

      20,35,50,05 * * * * /bin/env > /tmp/env.output; /home/reports/ftp/YTSM +SC50/loadData.pl; /home/reports/ftp/FDSMSC/loadData.pl

      and then check the contents of /tmp/env.output and compare them to the output of env when you run it at a command line for any important/potential differences. You could then set these env variables to your perl script.

      Some other obvious things would be to make sure that you're disconnecting from the db. And if it's not running properly from cron on a regular basis, then run it only once from cron and debug that and ensure that it does run fine from cron, before filling up your cron with multiple runs each hour.

      Finally, why is your /usr filling up on a regular basis as a result of this script?

      Some thoughts:
      • Do you really need 225 columns in one table? Maybe you could split data over different tables, keeping columns together based on what they represent - these "classes" shouldn't be difficult to spot among 225 possible keys of a log from an SMSC.
      • It seems that you basically replicate the script inside many directories - I hope this is done via (hard|sym)linking instead of plain copy. You could probably add an input parameter to the script, kept in a single known point: it could increase maintainability.
      • As others said, you should avoid to have them run contemporary. This could mean avoding CRON entirely: I was once biten by a similar problem (collection and elaboration of data from RNCs or from provisioning nodes) and I eventually resorted to using a single scheduling script that runs the jobs *sequentially* instead of in parallel. OTOH, if you need to stick to CRON, try to time the execution time of the different processes, and strive to separate their executions by at least those execution times (in the case of the repetitive tasks it would be probably wise to use 05,20,35,50 for one and 12,27,42,57 for the other).
      • If you fill your disks... you need bigger ones. Probably some monitoring script with some alarm capabilities would help too.

      Flavio
      perl -ple'$_=reverse' <<<ti.xittelop@oivalf

      Don't fool yourself.
      Questions, remarks:
      • How come the script is filling up /usr? Where is it writing, with whose permission, and why? Ideally, the size of /usr should only change when installing patches, or upgrading your Operating Environment.
      • Do you script actually do what they are supposed to do? Do you scripts connect to the database, or are they just hanging there, trying to log on?
      • How fast do your scripts run "by hand"? If it takes 20 minutes by hand, and you start one every 15 minutes, you will run into problems.
      • To avoid having to many instances running, if I write cron jobs making database connections that fire every 15 minutes, I use a lock file to avoid multiple instances from running. Policies can vary: the one failing to get the lock exits, the one failing to get a lock kills the one holding the lock, or a combination of the two (exit if the lock is held by a process that started less then $X minutes ago - else kill the one holding the lock). Waiting for the lock usually isn't a good idea.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://482752]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2024-04-24 12:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found