shoness has asked for the wisdom of the Perl Monks concerning the following question:
I've a file containing ~20k lines of ASCII data. Each line has about 170 fields. I care about ~30 fields, scattered throughout each line. On each line, data is not delimited. Each field is of fixed width. The first field on each line is unique (their destiny to become my hash key).
Speed is not a factor, but it seems "unpack" may be the best choice. In my initial implementation, I've never coded anything so ugly as this. The line with the format specifier is almost 400 characters long. Yuk!
My second implementation used the document that specified the width to automatically build up the format string and grab the names of the fields:my $hash; while (<$fh>) { next unless m/^(\d{14})/; my $code = $1; ($hash->{$code}->{'CODE'}, $hash->{$code}->{'FIELD2'}, $hash->{$code}->{'FIELD3'}, $hash->{$code}->{'FIELD4'}, $hash->{$code}->{'FIELD5'}, $hash->{$code}->{'FIELD6'}, ... $hash->{$code}->{'FIELD170'}) = unpack( "A14A1A1A1A5A5A30A50A20A1A5A5A5A5...."); }
It was not significantly better because I still need to know the number of fields in order to built the left-hand-side and now I've got another file to parse, etc.($hash->{$code}->{$name[0]}, $hash->{$code}->{$name[1]}, $hash->{$code}->{$name[169]}) = unpack($fmt);
I'm really missing something. If I used the regexp engine with /g, I could programmatically walk down each line pulling out the fields I want.
I'm just not sure here...other than that I must be missing something. Your advice is GREATLY appreciated!
Thanks!
Cheers!
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Unpack Many Fields
by BrowserUk (Patriarch) on Feb 15, 2010 at 23:53 UTC | |
by ikegami (Patriarch) on Feb 16, 2010 at 01:53 UTC | |
by shoness (Friar) on Feb 16, 2010 at 02:23 UTC | |
by ikegami (Patriarch) on Feb 16, 2010 at 02:47 UTC | |
by shoness (Friar) on Feb 16, 2010 at 02:54 UTC | |
by BrowserUk (Patriarch) on Feb 16, 2010 at 03:01 UTC | |
by ikegami (Patriarch) on Feb 16, 2010 at 05:09 UTC | |
by BrowserUk (Patriarch) on Feb 16, 2010 at 05:48 UTC |