Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

hash array

by roadtest (Sexton)
on Nov 03, 2010 at 17:15 UTC ( [id://869279]=perlquestion: print w/replies, xml ) Need Help??

roadtest has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to use hash array to calculate the running time of each cron job. The log file likes following:
== < root 26144 c Tue Nov 2 03:10:02 2010 < oracle 26161 c Tue Nov 2 03:10:25 2010 < oracle 26193 c Tue Nov 2 03:10:30 2010 < sybase 26163 c Tue Nov 2 03:10:32 2010 > oracle 26161 c Tue Nov 2 03:10:33 2010 < sybase 26188 c Tue Nov 2 03:10:38 2010 > sybase 26163 c Tue Nov 2 03:10:58 2010 ==
The leading character "<" means job started, ">" means job finished. The second field is username. Third field is jobID, which might be reused after this one is done. Since all these are daily jobs, I only care field 8 to calculate run time. Here is what I am trying to do:
==start code==
#!/usr/bin/perl use strict; use warnings; use Date::Manip; my @owner= qw/oracle sybase/; open (FILE,"cron.log"); while(<FILE>) { chomp; next if (split /[ ]+/)[1] !~ /oracle|sybase/ ; my ($mode,$owner,$job_id,undef,undef,undef,undef,$timestamp,undef) +=split /[ ]+/; #get same jobID finish timestamp and calculate difference, #then save back ${$owner}{$job_id}=DateCalc(${$owner}{$job_id},$timestamp) if ($mo +de=~ /^>/); } close(FILE); foreach $owner(@owner){ foreach (keys %{$owner}) { print "$owner - JobID:$_ - RunTime:${$o +wner}{$_}\n"; };}
==end code here==
There are three problems with my code:
1. under "strict" mode, it doesn't allow me to use number(jobID in my case) as ARRAY ref.
2. didn't consider same jobID may appeared later
3. It seems "${$owner}{$job_id}" is treated as array reference instead of hash array. When I debug above code, this variable is always null.

Seek a better approach to achieve the target. Thanks in advance!

Replies are listed 'Best First'.
Re: hash array
by jethro (Monsignor) on Nov 03, 2010 at 17:38 UTC

    Perl has multidimensional hashes. Just do the following

    my %dates; while ... ... $dates{$owner}{$job_id}= DateCalc( $dates{$owner}{$job_id}, ... ... foreach (keys %{$dates{$owner}}) ...

    This takes care of problems 1 and 3. To solve problem 2 you might store the results of the Datecalc into a different hash and use arrays to accumulate them, i.e. push @{$results{$owner}}, DateCalc...

Re: hash array
by kcott (Archbishop) on Nov 03, 2010 at 17:54 UTC

    Here's an almost complete solution:

    #!perl use strict; use warnings; my (%log, @results); while (<DATA>) { my ($mode, $user, $jobid, $timestamp) = (split)[0, 1, 2, 7]; if ($mode eq '<') { $log{$user.$jobid} = $timestamp; } else { push @results, [$user, $jobid, calc_runtime($log{$user.$jobid} +, $timestamp)]; delete $log{$user.$jobid}; } } for (@results) { printf "%s - JobID: %s - Runtime: %s\n", @$_; } sub calc_runtime { my ($start, $end) = @_; return qq{$end - $start}; } __DATA__ < root 26144 c Tue Nov 2 03:10:02 2010 < oracle 26161 c Tue Nov 2 03:10:25 2010 < oracle 26193 c Tue Nov 2 03:10:30 2010 < sybase 26163 c Tue Nov 2 03:10:32 2010 > oracle 26161 c Tue Nov 2 03:10:33 2010 < sybase 26188 c Tue Nov 2 03:10:38 2010 > sybase 26163 c Tue Nov 2 03:10:58 2010

    Which outputs:

    $ cron_log_prob.pl oracle - JobID: 26161 - Runtime: 03:10:33 - 03:10:25 sybase - JobID: 26163 - Runtime: 03:10:58 - 03:10:32

    I'll leave you to pick apart the timestamps and do the calculation in calc_runtime(). If all your jobs are guaranteed to run for less than 24 hours, you can just check if the start time is later than the end time (indicating it ran over midnight) and do the appropriate arithmetic. If they're going to run for longer than 24 hours, you'll need to capture more time info but the process remains the same.

    Update: I forgot you only wanted oracle and sybase jobs. Rather than using a regex with alternation (which may become rather unwieldy - and slow - if you need several users) try the following with a hash:

    ... my %wanted_user = map { $_ => 1 } qw{oracle sybase}; while (<DATA>) { my ($mode, $user, $jobid, $timestamp) = (split)[0, 1, 2, 7]; next if not $wanted_user{$user}; ...

    -- Ken

      Thanks Ken, I learn usage of anonymous array from your code.
Re: hash array
by liverpole (Monsignor) on Nov 03, 2010 at 18:01 UTC
    Hi roadtest,

    You have a number of issues, all of which are very minor.

    For one thing, note that @owner and %owner are separate things (the first is an array, the second a hash), and you don't ever declare the hash:

    my %owner = ( );

    You should also (though you didn't ask about this) check the return value from open, otherwise (as I did) you could get a confusing error message like "readline() on closed filehandle FILE at C:\test\x.pl line 10".  Here's a simple fix for that:
    open (FILE,"cron.log") or die "Can't open 'cron.log -- here's why: + $!\n";
    but even better than that is to use the 3-arg open statement instead of the 2-arg version, which (along with lexical filehandles) is considered better programming practice:
    use IO::File; my $fh = new IO::File("cron.log", "r") or die "Can't open 'cron.log' ( +$!)\n"; # Now use $fh in place of FILE ...

    It might be better to break next if (split /[ ]+/)[1] !~ /oracle|sybase/ ; into multiple lines:

    my @split = split /[ ]+/; my $split1 = $split[1] || ""; next if ($split1 !~ /oracle|sybase/)

    so that when your input file doesn't contain what you expect, or something else goes wrong, you can debug the values in @split, or the value of $split1 (which might be undefined; hence my converting it to "").

    You don't need to do (keys %{$owner} (which you've done in two or three places), it suffices to do (keys %owner) to get the keys of the hash.

    Here's a version that runs without warnings:

    #!/usr/bin/perl use strict; use warnings; use Date::Manip; use IO::File; my @owner = qw/oracle sybase/; my %owner = ( ); my $fh = new IO::File("cron.log") or die "Can't open 'cron.log' ($!)\n +"; while(<$fh>) { chomp; my @split = split /[ ]+/; my $split1 = $split[1] || ""; next if ($split1 !~ /oracle|sybase/); my ($mode,$owner,$job_id,undef,undef,undef,undef,$timestamp,undef) + = split /[ ]+/; #get same jobID finish timestamp and calculate difference, #then save back ${owner}{$job_id} = DateCalc (${$owner}{$job_id},$timestamp) if ($mode=~ /^>/); } close $fh; foreach my $owner(@owner) { foreach (keys %owner) { print "$owner - JobID:$_ - RunTime:${$owner}{$_}\n"; }; }

    Now you can focus on whether the program is working as you wish...


    s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
      Thanks for pointing out my bad habit. Points are taken.:-)
Re: hash array
by toolic (Bishop) on Nov 03, 2010 at 18:02 UTC
    You didn't show the output you desire, so I can only guess, but I would try an approach like this:
    use strict; use warnings; use Date::Manip; my %owners; while (<DATA>) { my ($mode, $owner, $job_id, $timestamp) = (split)[0..2,7]; next unless $owner =~ /oracle|sybase/; $owners{$owner}{$job_id}{$mode} = $timestamp; } for my $owner (keys %owners) { for my $job (keys %{ $owners{$owner} }) { if (exists $owners{$owner}{$job}{'>'}) { my $t1 = $owners{$owner}{$job}{'<'}; my $t2 = $owners{$owner}{$job}{'>'}; my $d = DateCalc($t2, $t1); print "$owner - JobID:$job - RunTime:$d\n"; } } } __DATA__ < root 26144 c Tue Nov 2 03:10:02 2010 < oracle 26161 c Tue Nov 2 03:10:25 2010 < oracle 26193 c Tue Nov 2 03:10:30 2010 < sybase 26163 c Tue Nov 2 03:10:32 2010 > oracle 26161 c Tue Nov 2 03:10:33 2010 < sybase 26188 c Tue Nov 2 03:10:38 2010 > sybase 26163 c Tue Nov 2 03:10:58 2010

    Prints out:

    oracle - JobID:26161 - RunTime:-0:0:0:0:0:0:8 sybase - JobID:26163 - RunTime:-0:0:0:0:0:0:26
      Thanks, your code is very readable. I will incorporate it into my program.
Re: hash array
by aquarium (Curate) on Nov 03, 2010 at 22:12 UTC
    most modern cron jobs are wrapped in a script that does some basic housekeeping, e.g. making sure not attached to terminal, create a .pid file, and generate a proper timestamp entry in syslog for the process start and end, like "PID process_name started YYYYMMDD HH:MM:SS" and corresponding process terminate stamp. In any case, you could set this up if desired..as I think it's a bit more robust than processing process polling output/log.
    the hardest line to type correctly is: stty erase ^H
Re: hash array
by roadtest (Sexton) on Nov 05, 2010 at 02:20 UTC
    Thanks everyone for your excellent suggestions! Your time is very appreciated!

    I end up using multi-dimension hash array to keep/calculate the time stamp. It works great and code is much readable!

    Cheers,

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://869279]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (2)
As of 2024-04-25 20:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found