Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re^2: Counting concurrent event jobs

by vagnerr (Prior)
on Apr 24, 2006 at 18:11 UTC ( [id://545341] : note . print w/replies, xml ) Need Help??

in reply to Re: Counting concurrent event jobs
in thread Counting concurrent event jobs

Unfortunately that is not the case. The processing of these log files goes on for hours and involves hundreds of logs, each taking between a few minutes and an hour or two to run. We need to be able to graph the data (hence the csv output) and see that for example we do a lot of split jobs at one time of day and a lot of filter jobs at another. We need to know because some of the jobs use a lot of cpu, others may use a lot of network bandwidth, and we want to be able to tune things to share the resources we have.

Remember that amateurs built Noah's Ark. Professionals built the Titanic.

Replies are listed 'Best First'.
Re^3: Counting concurrent event jobs
by mantadin (Beadle) on Apr 24, 2006 at 18:51 UTC

      for a more Perlish way, consider useing the RRDs module from the RRDtool website or RRD::Simple , RRD:OO from CPAN.

Re^3: Counting concurrent event jobs
by sfink (Deacon) on Apr 25, 2006 at 02:25 UTC
    That still doesn't seem to contradict gaal's approach, and only differs from mantadin's in choosing when and how to report. If I'm missing something, then tell me what is wrong with
    my $REPORT_INTERVAL = 300; # seconds my %active = ( 'split' => 0, 'filter' => 0); my $next_report = date_to_timestamp("...start of day..."); my $last_report = date_to_timestamp("...end of day..."); while(<>) { # Parse out the fields my ($date, $action, $jobtype, $logfile) = /.../; # Update current active job counts if ($action eq 'start') { ++$active{$jobtype}; elsif ($action eq 'finish') { --$active{$jobtype}; } else { die "Huh? $_"; } # Output counts for all report lines between # the last printed report and the time of this # log line. Most of the time, this will be empty # because we won't have reached the next report # time yet. my $stamp = date_to_timestamp($date); while ($stamp > $next_report) { report_counts($next_report, \%active); $next_report += $REPORT_INTERVAL; } } # Finish off the report for the report periods # at the end of the reporting range. while ($next_report < $last_report) { report_counts($next_report, \%active); $next_report += $REPORT_INTERVAL; }

    Based on your proposed solution, it seems like you think that you have to correlate a finish event with the start event for that job -- but if all you want is the counts, then as gaal said, the correlation is unnecessary.

    If for some reason you do need to correlate them, then you can always keep all active jobs' state in the %active hash:

    ... if ($action eq 'start') { $active{$jobtype}{$logfile} = 1; } elsif ($action eq 'finish') { delete $active{$jobtype}{$logfile}; } ... my $split_count = keys %{ $active{'split'} }; ...