http://qs321.pair.com?node_id=1212579

mxb has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks.

I'm currently improving my Perl by practising Perl idioms and trying to write more 'Perlish' code. I'm currently practising by parsing a binary blob with unpack.

My input is a scalar, and my desired output is an array of arrays.

As I'm from a C background my initial approach is to use the C-like for loop and I end up with code like the following:

#!/usr/bin/env perl use strict; use warnings; use 5.016; use Data::Dumper; my $data = "1ABCD2EFGH3IJKL4MNOP5QRST6UVWX"; my $entry_size = 5; my @out; for (my $off = 0; $off < length $data; $off += $entry_size) { my $item = substr ($data, $off, $entry_size); push @out, [unpack "CA*", $item]; }; print Dumper \@out;

While it works, it seems unnecessarily verbose for Perl and I'm aware I'm trying to write C in Perl.

Therefore, I've been attempting to rewrite the code in a more Perlish manner. I've ended up with the following:

#!/usr/bin/env perl use strict; use warnings; use 5.016; use Data::Dumper; my $data = "1ABCD2EFGH3IJKL4MNOP5QRST6UVWX"; my $entry_size = 5; my @items = unpack "(a$entry_size)*", $data; my @out = map { [unpack "CA*"] } @items; print Dumper \@out;

I'm happy with the map { unpack ... } ... construct as this is clear and concise, but I'm a little less sure about the first unpack to split the $data scalar into the list of items.

Therefore I'm deferring to the wisdom of the Monks, is there a better way to achieve what I am doing? Maybe the approach should be a single unpack "(CA4)*", $data and then rebuilding the child lists?

Replies are listed 'Best First'.
Re: Perlish approach to parsing a binary blob
by BrowserUk (Patriarch) on Apr 09, 2018 at 12:37 UTC

    Combine both statements:

    #!/usr/bin/env perl use strict; use warnings; use Data::Dump qw[ pp ]; my $data = "1ABCD2EFGH3IJKL4MNOP5QRST6UVWX"; my $entry_size = 5; my @out = map[ unpack 'CA*', $_ ], unpack "(A$entry_size)*", $data; pp \@out;

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
Re: Perlish approach to parsing a binary blob
by ikegami (Patriarch) on Apr 09, 2018 at 12:31 UTC

    You could chain the two statements.

    my @out = map [ unpack 'Ca*' ], unpack '(a5)*', $data;
Re: Perlish approach to parsing a binary blob
by choroba (Cardinal) on Apr 09, 2018 at 12:48 UTC
    If you don't mind having a structure that stores something else at the beginning, you can use one unpack and then change the structure in place. It seems almost 10% faster on my machine:
    my $data = "1ABCD2EFGH3IJKL4MNOP5QRST6UVWX"; my $entry_size = 5; my $a_size = $entry_size - 1; my @out = unpack "(CA$a_size)*", $data; push @out, [ splice @out, 0, 2 ] for 1 .. @out / 2;

    Update: BrowserUk's solution seems even faster, adding again almost 10%.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Perlish approach to parsing a binary blob
by AnomalousMonk (Archbishop) on Apr 09, 2018 at 14:17 UTC
    ... I'm a little less sure about the first unpack to split the $data scalar into the list of items.

    Can you say more about why you are unsure about this statement? Is it because of the interpolated  $entry_size component in the unpack template? If so, be assured that this is a perfectly kosher maneuver: a template specification string is just a string no matter how compounded. (Sometimes a string is just a string...)

    Update 1: Changed a word, added a link.

    Update 2: Hey, this is very late, but I just noticed that the unpack documentation includes an example of this exact technique in a limited implementation of substr:

    sub substr { my($what,$where,$howmuch) = @_; unpack("x$where a$howmuch", $what); }


    Give a man a fish:  <%-{-{-{-<

      Hi,

      Yes, I think that's pretty much hit it straight on.

      The "(a$entry_size)*" just looked a bit odd when I first wrote it. I'm glad to see that is a reasonably common approach and does not immediately scream out as "bad code" or "obfuscation"

      Thanks

Re: Perlish approach to parsing a binary blob
by vr (Curate) on Apr 09, 2018 at 17:38 UTC

    If "L" in "blob" stands for "large", and script is anything but a throw-away one-liner, I'd be unhappy about implicitly building large useless intermediate lists.

    use strict; use warnings; use feature 'say'; use Data::Dump 'dd'; use Benchmark 'cmpthese'; my $data = "1ABCD2EFGH3IJKL4MNOP5QRST6UVWX" x 1000; cmpthese -3, { 1 => sub { my @out; for ( my $x = 0; $x < length $data; $x += 5 ) { push @out, [ unpack "CA4", substr $data, $x, 5 ] } return \@out }, 2 => sub { [ map [ unpack 'CA4', $_ ], unpack "(A5)*", $data ] } }; __END__ Rate 2 1 2 34.2/s -- -21% 1 43.1/s 26% --
A reply falls below the community's threshold of quality. You may see it by logging in.