Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Whats a good process for archiving cron processed files?

by jimbus (Friar)
on Aug 19, 2005 at 12:21 UTC ( [id://485097]=perlquestion: print w/replies, xml ) Need Help??

jimbus has asked for the wisdom of the Perl Monks concerning the following question:

Monksters,

I've been trying to write something to archive the files I'm processing and came up with this:

foreach $year (@years) { foreach $month (@months) { foreach $day (@days) { @filelist = glob("/home/reports/ftp/WSB/*$year$month$day*"); $filecount = @filelist; print "$month-$day $hour:$filecount\n"; $gzname = $month.$day.".gz"; foreach $filename (@filelist) { `cat $filename|gzip -9 >>$gzname`; `rm $filename`; } } @days = @alldays; } }
I don't like this because, its an after the fact clean up and I'd like to archive each file as I process it, It creates one monolith file and I've had some issues with gzip later claiming that these files are corrupt.

I tend to use gxip because thats what the people who taugh me Unix used. Are any of the others more reliable? I'd like to have them behave like they were tarred then zipped and I looked at some of the modules on cpan and it didn't look like any of them worked without going through the whole process of unzipping the whole archive, adding the new file to the tar and then rezipping. This seemed very expensive to me. I supposed taring isn't important if the files remain atomic when I uncompress the archive.

Anyhow, sorry to be asking so many newbie questions...

A hopeful Jimbus asks, "Never moon a werewolf?"

Replies are listed 'Best First'.
Re: Whats a good process for archiving cron processed files?
by Roger (Parson) on Aug 19, 2005 at 12:34 UTC
    Why don't you use the stock standard tar utility? Ok, I mean there are multiple versions of the tar program, but I am talking about the latest version of gnu tar.

    You can do something along the lines of:
    `tar -zcvf $archive.$year$month$day.gz /home/reports/ftp/WSB/*$year$mo +nth$day*`

      I'd really like them compressed in some format or another
      Never moon a werewolf!
        It does. -z compresses.
        -z --gzip --gunzip Filter the archive through gzip(1).
Re: Whats a good process for archiving cron processed files?
by davidrw (Prior) on Aug 19, 2005 at 12:42 UTC
    (If possible) instead of generating YYYYMMDD-named files, if you just generate files like /home/reports/ftp/WSB/someoutput.txt then you can simply have logrotate deal with them -- it automagically takes care of renaming them daily (or whenever you want) including compression and deleting ones that are too old (which you define). Your logrotate.conf would just be something like (see manpage for details, especially about the wildcard):
    /home/reports/ftp/WSB/* { rotate 9 compress daily olddir /home/reports/ftp/WSB/old }
      I have a requirement to keep a years worth incase we have to go back and prove something. Using a date formatted name keeps me sane :).
      Never moon a werewolf!
Re: Whats a good process for archiving cron processed files?
by pboin (Deacon) on Aug 19, 2005 at 12:30 UTC

    If it were me, I'd consider using a simple bash script instead. Something along the lines of this:

    find /home/reports/ftp/WSB/ -iname ?????? -exec zip -m9 {}.zip \;

    Definitely not tested (for syntax or otherwise), but maybe it gives you a more simple angle-of-attack, and 'Simple Is Good.'

      While the code example was from an after the fact cleanup script, I'd actually like to archive the individual files as I process them in my already existing perl script, so I was looking to a perl solution first :)
      Never moon a werewolf!
Re: Whats a good process for archiving cron processed files?
by tlm (Prior) on Aug 19, 2005 at 13:54 UTC

    Maybe PerlIO::gzip is what you're looking for. In the example below, the script adds compressed copies of itself to a single file.

    the lowliest monk

Re: Whats a good process for archiving cron processed files?
by 5mi11er (Deacon) on Aug 19, 2005 at 18:07 UTC
    I'm not entirely sure what your current process is, but it seems like you instinctively know there's a better way than what you're currently doing. So, allow me to walk you through the way I've processed and archived various syslog files.

    For syslog files, or any constantly being written-to files, under unix, it's not difficult to "rotate" them, you simply need to know the accepted standard practice.

    1. move the current being written to file to a new name
    2. signal syslog that it needs to reopen its log files
    3. process the 'new name' file, and archive it as you wish

    Under unix, the processes will continue to write to the original file, even after it's been moved to a new name until they are told to close the current file and reopen the original file name(s). So, the process above guarantees that you won't lose any messages while you're moving the files around.

    So, now we have a static (non-changing over time) file to process and archive. In my scripts that do the processing, I include the step to gzip that particular file. I also include, as a command line argument, the directive of whether to actually archive the file or not so I can process the current syslog files if I need to without archiving it. And I save it with a dated name as you've indicated you do.

    At the end of the month, I have 30ish files with names like local7.20050701.gz local7.20050702.gz ... local7.20050731.gz. On the first of every month I have another process called "archive-month" that runs, and since I like the whole month's worth of information in one file, I gzcat them all to a local7.200507 file, and then gzip that, but you could unzip them all, then tar and gzip the tar file. For text files especially, you want to tar up the uncompressed files, then compress the tar file, you don't want a tar file with gzipped files inside it, the compression won't be nearly as good.

    I'll include my archive-monthly script in a little bit, I need to clean it up :-)

    -Scott

    Update: gzipped files do have the .gz extention, originally forgot to add those to the names above.

    Update2: Code follows:

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://485097]
Approved by Roger
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (None)
    As of 2024-04-25 01:45 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found