Parsing a csv file with multiple data records in it (generated from Window's IOMeter)

TASdvlper has asked for the wisdom of the Perl Monks concerning the following question:

All,

This might be a little hard to explain, but I'll try anyway. I have a xls/csv file, that has multiple data records in it. The problem is the records are not grouped together. For example (where '-' is being used to signify a new column), (the numbers are obviously made up)

# - size - reads   - writes   - read IOPs - write IOPs  
1 - 512  - 1000    - 0        - 2000      - 0
2 - 1024 - 0       - 1000     - 0         - 2000
3 - 2048 - 500     - 500      - 1000      - 1000

# - Time Stamp
1 - 2005-05-02 18:16:36:589
2 - 2005-05-03 18:16:36:589
3 - 2005-05-04 18:16:36:589

# - Comments
1 - This test failed
2 - This test passed
3 - This test needs more data.
[download]

So, in this case, there are 3 records that I want to import into my db (all the lines that start with 1, 2 and 3). But, as you can see, they are on different lines. Each column name (size, reads, writes, etc) are attributes in my database table.

So, my thought was to make a array of hash references where the array size would be how many records are in the file (because it may vary), and the hash ref would contain all the data for that record. So, my array would look something like:

$myarray[0] = {
              'size'       => '512',
              'reads'      => '1000',
              'writes'     => '0',
              'read IOPs'  => '2000',
              'write IOPS' => '0',
              'Time Stamp' => '2005-05-02 18:16:36:589',
              'Comments'   => 'Thie tst failed'
              }
[download]

I'm just not sure how to parse that file where the data is all on different lines. There is one thing, each line that starts with a "#" is the column name, so that might help in the parsing.

As always - thanks all.

Comment on Parsing a csv file with multiple data records in it (generated from Window's IOMeter) Select or Download Code

Replies are listed 'Best First'.
Re: Parsing a csv file with multiple data records in it (generated from Window's IOMeter) by holli (Abbot) on May 25, 2005 at 19:55 UTC
use strict; use warnings; use Data::Dumper; my @fields; my @data; while (<DATA>) { chomp; if ( /^#/ ) { @fields = split / +- +/; shift @fields; next; } next if /^$/; my @values = split / +- +/; my $index = shift @values; for my $i (0..$#fields) { $data[$index-1]->{$fields[$i]} = $values[$i]; } } print Dumper (\@data); __DATA__ # - size - reads - writes - read IOPs - write IOPs 1 - 512 - 1000 - 0 - 2000 - 0 2 - 1024 - 0 - 1000 - 0 - 2000 3 - 2048 - 500 - 500 - 1000 - 1000 # - Time Stamp 1 - 2005-05-02 18:16:36:589 2 - 2005-05-03 18:16:36:589 3 - 2005-05-04 18:16:36:589 # - Comments 1 - This test failed 2 - This test passed 3 - This test needs more data. [download] Output: $VAR1 = [ { 'Comments' => 'This test failed', 'read IOPs' => '2000', 'reads' => '1000', 'write IOPs' => '0', 'Time Stamp' => '2005-05-02 18:16:36:589', 'writes' => '0', 'size' => '512' }, { 'Comments' => 'This test passed', 'read IOPs' => '0', 'reads' => '0', 'write IOPs' => '2000', 'Time Stamp' => '2005-05-03 18:16:36:589', 'writes' => '1000', 'size' => '1024' }, { 'Comments' => 'This test needs more data.', 'read IOPs' => '1000', 'reads' => '500', 'write IOPs' => '1000', 'Time Stamp' => '2005-05-04 18:16:36:589', 'writes' => '500', 'size' => '2048' } ]; [download] holli, /regexed monk/	[reply] [d/l] [select]
Re^2: Parsing a csv file with multiple data records in it (generated from Window's IOMeter) by tlm (Prior) on May 26, 2005 at 05:02 UTC
`for my $i (0..$#fields) { $data[$index-1]->{$fields[$i]} = $values[$i]; }` [download] I have slices in the brain: `@{ $data[ $index-1 ] }{ @fields } = @values;` [download] the lowliest monk	[reply] [d/l] [select]
Re: Parsing a csv file with multiple data records in it (generated from Window's IOMeter) by shemp (Deacon) on May 25, 2005 at 20:02 UTC
You've already got it 99% percent figured out! when you get to a line that starts with a '#', its a list of fields. Split it on the '-' and store the results, those are your field names. Then when you parse the lines that dont begin with a '#', also spilt them on the '-'. The data and the columns will match up by position. Then enter the results into your structure, i.e.: `$myarray[$row_number]->{$column_name} = $data;` [download]	[reply] [d/l]
Re^2: Parsing a csv file with multiple data records in it (generated from Window's IOMeter) by holli (Abbot) on May 25, 2005 at 20:12 UTC
The missing 1% is to skip empty lines, as shown in my code above. holli, /regexed monk/	[reply]


Pathologically Eclectic Rubbish Lister
	PerlMonks