http://qs321.pair.com?node_id=545325

vagnerr has asked for the wisdom of the Perl Monks concerning the following question:

Greetings fellow monks! ...

I am doing some performance analysis on some batch processing that we run (log processing). I need to determine how many concurrent jobs are running at set intervals over a day/month period (and of what type), based on a log of the start and finish times of each job. For arguments sake lets say the log files are in the following format.
[<date>] <start|finish>: <command> <file>
for example
[Mon Apr 24 11:56:23 2006] start: split www1.log [Mon Apr 24 11:57:23 2006] start: filter www2.log [Mon Apr 24 12:50:23 2006] start: split www1.log [Mon Apr 24 13:59:23 2006] finish: filter www2.log
I need to be able to convert that to some sort of report along the lines of
time,splits,filters,total 11:55,0,0,0 11:56,1,0,1 11:57,1,1,2 ... 12:50,1,1,2 12:51,0,1,1 ... 13:59.0,1,1 14:00,0,0,0
The simplest solution would appear to be to create a nice big array, or hash. Each node would represent a 1-5 minute window (depending on the required granularity) referencing an array of the individual types. The logs would then be processed line by line. As we find a matching set of "start" and "finish" lines we update the counters for that type for all the time segments between the start and the finish times.

The problem as I see it with this solution is that whilst it does work, its a little messy and can create quite a large data structure. Does anyone have any suggestions on a better approach to the problem?

Thanks
Vagnerr.


_____________
Remember that amateurs built Noah's Ark. Professionals built the Titanic.