Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Unpack Many Fields

by shoness (Friar)
on Feb 15, 2010 at 23:38 UTC ( #823371=perlquestion: print w/replies, xml ) Need Help??

shoness has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I've a file containing ~20k lines of ASCII data. Each line has about 170 fields. I care about ~30 fields, scattered throughout each line. On each line, data is not delimited. Each field is of fixed width. The first field on each line is unique (their destiny to become my hash key).

Speed is not a factor, but it seems "unpack" may be the best choice. In my initial implementation, I've never coded anything so ugly as this. The line with the format specifier is almost 400 characters long. Yuk!

my $hash; while (<$fh>) { next unless m/^(\d{14})/; my $code = $1; ($hash->{$code}->{'CODE'}, $hash->{$code}->{'FIELD2'}, $hash->{$code}->{'FIELD3'}, $hash->{$code}->{'FIELD4'}, $hash->{$code}->{'FIELD5'}, $hash->{$code}->{'FIELD6'}, ... $hash->{$code}->{'FIELD170'}) = unpack( "A14A1A1A1A5A5A30A50A20A1A5A5A5A5...."); }
My second implementation used the document that specified the width to automatically build up the format string and grab the names of the fields:
($hash->{$code}->{$name[0]}, $hash->{$code}->{$name[1]}, $hash->{$code}->{$name[169]}) = unpack($fmt);
It was not significantly better because I still need to know the number of fields in order to built the left-hand-side and now I've got another file to parse, etc.

I'm really missing something. If I used the regexp engine with /g, I could programmatically walk down each line pulling out the fields I want.

I'm just not sure here...other than that I must be missing something. Your advice is GREATLY appreciated!

Thanks!
Cheers!

Replies are listed 'Best First'.
Re: Unpack Many Fields
by BrowserUk (Pope) on Feb 15, 2010 at 23:53 UTC

    If you only need 34 of 170 fields, you can use x[nnn] to skip the bytes between the fields you want. That should reduce the unweildy length of your template considerably.

    And if the template is still uncomfortably long:

    my $fmt = join '', qw[ x[10] a5 x[22] a10 x[4] a3 a2 ... ];

    You can even turn that into an opportunity to document:

    my $fmt = join '', grep !m[^#], qw[ x[10] a5 #code x[22] a10 #thingummy x[4] a3 a2 #doodah&whatsit ... ];

    As for the naming of the fields, rather than using strings of "FIELDnn" in a hash, why not use an array?

    Eg.

    my $fmt = "x[10]a5x[22]a10x[4]a3a2..."; my %hash; while( <$fh> ) { my( $code, @fields ) = unpack $fmt, $line; $hash->{$code} = \@fields; } ... for my $code ( keys %hash ) { print $hash->{ $code }[ $_ ] for 0 .. 33; }

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Using # in qw causes a warning. The following removes warnings and allows spaces to be used in the comment:
      my $fmt = q[ x[10] a5 # code x[22] a10 # thingummy x[4] a3 a2 # doodah & whatsit ]; $fmt =~ s/#.*//g;

      But pack allows for comments in patterns, so one doesn't even need that last line.

      Update: Originally, the /g was missing, which led me to discover that one doesn't need to remove the comments.

        I think it has to be something like
        $fmt =~ s/(?:#.*)|\n|\s//g;
        Thanks to both of you!
        My code is already looking much better.
        Cheers!

        Whaddayaknow! That definitely deserves a mench in perlfunc.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://823371]
Approved by Perlbotics
Front-paged by MadraghRua
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2020-10-28 03:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (259 votes). Check out past polls.

    Notices?