Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

processing logic help

by agoth (Chaplain)
on Sep 13, 2000 at 16:45 UTC ( [id://32269]=perlquestion: print w/replies, xml ) Need Help??

agoth has asked for the wisdom of the Perl Monks concerning the following question:

I have logfiles that are named elmo{x}.txt where x is 0000..9999. these are delivered by rsync, when the count goes to 9999 it goes over the top to 0000.
I have to process them in order and only once. I save the last number to a file and try and proceed from there.
These files arrive in sequence at random intervals but up to about 15 at a time.

The code I have come up with is below but is rather kludgy. Can anyone suggest improvements to the logic please??
cheers

my %temp = (); my %files = (); my ($zeroes, $nines) = (0,0); my $lastkey = 0; # lastfile contains number open (QS, $lastfile) or die "cant open $lastfile $!"; while (<QS>) { $lastkey = $_; } close QS; # <> contains the list of files from a .sh while (<>) { chop; next if $_ !~ /\.txt$/; my $file = $indir.$_; $_ =~ s/elmo(\d*)\.txt$/$1/; $temp{$_} = [ $file, $_ ]; $nines = 1 if $_ =~ /^999/; $zeroes = 1 if $_ =~ /^000/; } for (keys %temp) { my $k = $_; if (($nines && $zeroes) || ($zeroes && $lastkey =~ /^99/)) { $k += 10000 if ($k =~ /^000/); $lastkey += 10000 if ($lastkey =~ /^00/ && $lastkey < 9999); } $files{$k} = [ @{ $temp{$_} }]; } for (sort {$a <=> $b} keys %files) { my $file = $files{$_}->[0]; my $numkey = $_; if ($numkey > $lastkey) { if (-e $file){ # &io_process_file($lda, $file, 'incstock', \&load_incstock, + 1); $lastkey = $numkey; $lastkey += 10000 if (($nines && $zeroes) && $lastkey < 900 +0); } } } open (QS, ">$lastfile") or die "cant open $lastfile $!"; print QS $files{$lastkey}->[1]; close QS;

Replies are listed 'Best First'.
Re: processing logic help
by chromatic (Archbishop) on Sep 13, 2000 at 18:54 UTC
    If you just need to process them in chronological order, you could sort them based on their modification times:
    my @files = <*.txt>; @files = map { $_->[0] } sort { $b->[1] <=> $a->[1] } map { [ $_, (-M $_) ] } @files;
    Gotta love that Schwartzian Transform.

    You'll have to be in the appropriate directory for this to work, and you'll end up with @files containing a the filenames, sorted ascending by last modification time. You could do a simple addition and splice operation to pare down the list to the files you haven't used before.

    It's a different approach, anyway.

Re: processing logic help
by merlyn (Sage) on Sep 13, 2000 at 19:02 UTC
    use File::CounterFile; my $counter = File::CounterFile->new("/path/to/sentinel.txt", "0000"); chdir "/path/to/where/the/files/are" or die "chdir: $!"; ## see if a new file has arrived while (-e (my $file = "elmo$counter.txt")) { process_file($file); $counter = "0000" if ++$counter == "10000"; }
    You might need to initialize sentinel.txt to sync it up to the first file. Just put "1234" (no newline) into the file, where you expect elmo1234.txt to be the next rsync'ed file.

    update: (to add some explanation...)

    The File::Counterfile module creates a persistent file. The object returned from the module is overloaded so that it can be incremented and interpolated, but every change to the object gets reflected as a change to the original file. It uses Perl's "magical autoincrement" to generate items here from 0000 to 9999. Of course, when that wraps, I need to force it back to 0000. I'm also presuming this program gets invoked every 15 minutes or whatever, and that process_file unlinks the file as it does the job.

    And now the explanation is longer than the program, so I'll stop. {grin}

    -- Randal L. Schwartz, Perl hacker

RE (tilly) 1: processing logic help
by tilly (Archbishop) on Sep 13, 2000 at 18:59 UTC
    I will just comment on the ID's. Keep a number that you can index with:
    $next_id = ($id + 1) % 10000;
    and then produce the ID you search for with
    $named_id = sprintf('%04d', $id);
    I will let you fill in the rest of the code around this idea.
Re: processing logic help
by araqnid (Beadle) on Sep 13, 2000 at 17:39 UTC
    OK, this is my shot. On review, it perhaps isn't too good an approach - it generates an array of the files' serial numbers in the order they need to be processed, which I thought would be a good idea but not I'm not too sure. I've moved the routines to read/write $lastfile into read_lastkey() and write_lastkey() respectively, which is almost definitely a good idea imho. I haven't tested this code at all. It may not even compile, although I rather hope it would :)
    # Pass list of filenames as arguments sub listnumbers { my %files; foreach (@_) { next unless (/^elmo(\d+)\.txt$/); my $n = int($1); # Force string to integer $files{$n} = $_; } # Did we wrap-around? This happens iff we have a 0 and a # 9999 if (exists($files{0}) && exists($files{9999})) { # OK, so this is the tricky case # Count down from 9999 to find the first file my $lwm = 9999; while (exists($files{$lwm})) { --$lwm; die "I think something broke" if ($lwm < 5000); } # And count up from 0 to get the last file my $hwm = 0; while (exists($files{$hwm})) { ++$hwm; die "I think something broke" if ($hwm > 5000); } # And return the intervals ($lwm+1..9999, 0..$hwm-1); } else { # Easy case sort keys %files; } } # Main code my @numbers = listnumbers(<>); my $lastkey = read_lastkey(); my $foundlast = 0; foreach (@numbers) { if ($_ == $lastkey) { $foundlast = 1; next; } if ((($_ + 1) % 10000) == $lastkey) { $foundlast = 1; } if ($foundlast) { process_file(sprintf("elmo%04d.txt", $_)); $lastkey = $_; } } if ($foundlast) { write_lastkey($lastkey); } else { die "Didn't find a plausible sequel to $lastkey in [@numbers]"; }
processing logic help (2)
by agoth (Chaplain) on Sep 13, 2000 at 19:50 UTC
    To further complicate, i had tried mtime as a differentiator but:
    • cannot rely on the mtime from the remote server and the files actually arriving in a sequence that corresponds to the mtime.
    • I cannot either rely on the comm's link to the remote server being up
    • cannot trust the files to be in sequence, but they will be in order, the gap will be unspecific.

    But thanks for the above, to all, I have something now that 'will do'

Re: processing logic help
by turnstep (Parson) on Sep 15, 2000 at 00:44 UTC
    ...cannot trust the files to be in sequence, but
    they will be in order, the gap will be unspecific

    Woah! That really throws a monkey wrench in things. So the file 9999 might not even exist, but 9998 and 0000 will? Hrmm....this calls for another solution. Let's have the program make a best guess as to where the selection of files actually begins. First, some code to generate "random elmos" (but not infinite elmos!):

    my ($rand,%seen,@files); my $number=shift || 15; { $rand = substr(rand,-7,4); unless ($seen{$rand}++) { ## Files must be unique! push @files, "elmo$rand.txt"; --$number or last; } redo; }

    And here's the actual code. Globbing is a better way for actual use, of course.

    my @files = <elmo*.txt>; ## Convert the list to numbers and warn about any unusual elmos: @files = map { m/^elmo(\d{4})\.txt$/ and $1 or die "What is $_ doing here?!\n"; } @files; ## Now figure out where the largest gap is: my ($diff,$old,$high); for (sort @files) { $diff=$_-$old and $high=$_ if $_-$old>$diff; $old=$_; } ## Now go in order, starting at the first value after the gap: for ( map {$_->[0]} sort {$a->[1] <=> $b->[1] } map {[$_, $_ >= $high ? $_-10000 : $_]} @files) { print "Parsing elmo$_.txt\n"; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://32269]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (5)
As of 2024-04-25 08:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found