Counting concurrent event jobs

vagnerr has asked for the wisdom of the Perl Monks concerning the following question:

Greetings fellow monks! ...

I am doing some performance analysis on some batch processing that we run (log processing). I need to determine how many concurrent jobs are running at set intervals over a day/month period (and of what type), based on a log of the start and finish times of each job. For arguments sake lets say the log files are in the following format.

[<date>] <start|finish>: <command> <file>
[download]

for example

[Mon Apr 24 11:56:23 2006] start: split www1.log
[Mon Apr 24 11:57:23 2006] start: filter www2.log
[Mon Apr 24 12:50:23 2006] start: split www1.log
[Mon Apr 24 13:59:23 2006] finish: filter www2.log
[download]

I need to be able to convert that to some sort of report along the lines of

time,splits,filters,total
11:55,0,0,0
11:56,1,0,1
11:57,1,1,2
...
12:50,1,1,2
12:51,0,1,1
...
13:59.0,1,1
14:00,0,0,0
[download]

The simplest solution would appear to be to create a nice big array, or hash. Each node would represent a 1-5 minute window (depending on the required granularity) referencing an array of the individual types. The logs would then be processed line by line. As we find a matching set of "start" and "finish" lines we update the counters for that type for all the time segments between the start and the finish times.

The problem as I see it with this solution is that whilst it does work, its a little messy and can create quite a large data structure. Does anyone have any suggestions on a better approach to the problem?

Thanks
Vagnerr.

_____________
Remember that amateurs built Noah's Ark. Professionals built the Titanic.

Back to Seekers of Perl Wisdom