http://qs321.pair.com?node_id=11127252

GrandFather has asked for the wisdom of the Perl Monks concerning the following question:

I have a binary file (think ELF or Windows PE) that includes fields that may be long (32 bit) or quad (64) bit and little or big endien depending on the specific file. The script may be running on a big or little endian machine with a Perl built for 32 or 64 bits. I want to unpack the fields for later use that will include display using printf and file operations using read and seek so I need to convert from file representation to the running Perl's native representation for those fields. The following sample code does that, but the pack 'L2' ...; unpack 'Q' feels a bit clunky. Can it be tidied up?

use warnings; use strict; use Config; use Fcntl; printf "Perl $^V %s ivsize %d byteorder %d\n", $Config{archname}, $Config{ivsize}, $Config{byteorder}; # Generate a "binary file" with a string of bytes likely to show up is +sues in # decoding my $binary = "\x91\x34\x33\x90\x81\x32\x31\x80"; for my $fileLE (0, 1) { my $fromFile = $fileLE ? 'V4' : 'N4'; open my $inFH, '<:raw', \$binary; read $inFH, my $raw1, 4; read $inFH, my $raw2, 4; my $long1 = unpack $fromFile, $raw1; my $long2 = unpack $fromFile, $raw2; my $packed = pack "L2", $fileLE ? ($long1, $long2) : ($long2, $lon +g1); my $longlong = unpack 'Q', $packed; printf "%s: %016x\n", ($fileLE ? "LE" : "BE"), $longlong; }

A 32 bit Windows build prints:

Perl v5.32.0 MSWin32-x86-multi-thread-64int ivsize 8 byteorder 1234567 +8 BE: 9134339081323180 LE: 8031328190333491

A nice sanity check would be to run the code on a big endian system and see that the numbers generated are the same. Of course the long long processing will come unstuck where ivsize < 8.

Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond

Replies are listed 'Best First'.
Re: Unpacking byte stream long/quad little/big endian fields
by Tux (Canon) on Jan 22, 2021 at 07:44 UTC
    #!/pro/bin/perl use 5.18.3; use warnings; use Config; use Fcntl; printf "Perl $^V %s ivsize %d byteorder %d\n", @Config{qw( archname ivsize byteorder )}; # Generate a "binary file" with a string of bytes likely to show up is +sues in # decoding my $binary = "\x91\x34\x33\x90\x81\x32\x31\x80"; for ([ "BE32" => "I>I>" ], [ "LE32" => "I<I<" ], [ "BE64" => "Q>" ], [ "LE64" => "Q<" ], ) { my ($type, $format) = @$_; my $data = pack $format => 0, 0; open my $fh, "<", \$binary; read $fh, $data, length $data; my @x = unpack $format => $data; if (@x == 2) { printf "%s\: %08x%08x\n", $type, @x; } else { printf "%s\: %016x\n", $type, @x; } }

    =>

    BE32: 9134339081323180 LE32: 9033349180313281 BE64: 9134339081323180 LE64: 8031328190333491

    Enjoy, Have FUN! H.Merijn

      Ahh, I hadn't noticed < and > in the pack/unpack docs. It is there in my Perl 5.14 Pocket Reference, but somewhat buried over the page in the middle of a large paragraph. They seem not to be mentioned in the 5.10 version of the reference, but poking around with perldoc indicates the endien modifiers are new in 5.10. Maybe that's why I wasn't aware of them?

      Anyway, much cleaner, thank you.

      Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: Unpacking byte stream long/quad little/big endian fields
by salva (Canon) on Jan 22, 2021 at 08:22 UTC
    I faced a similar problem while writting Net::SFTP::Foreign.

    My solution was to create a set of functions which pop the data from the buffer, allowing me to write code as the following:

    my $buffer = read ...; my $len = get_int32($buffer); my $cmd = get_int8($buffer); my $txt = get_string($buffer);
    This is the module implementing those functions: Net::SFTP::Foreign::Buffer.

      Sure, that's essentially what I'm doing too. But sometimes people peek under the hood, and I'd like it sparking under there too.

      Tux's answer provides a nice solution for Perl >= 5.10 and that's probably what I'll run with. 5.10 has been around long enough now that I'm not much worried about supporting anything earlier.

      Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: Unpacking byte stream long/quad little/big endian fields
by xiaoyafeng (Deacon) on Jan 25, 2021 at 08:00 UTC

    if you treat this everyday, the better way is to create a template:

    use strict; use warnings; use Convert::Binary::C; use Data::Dumper; use Data::Hexdumper; my $c = Convert::Binary::C->new(ByteOrder => 'BigEndian', LongSize => + 8)->parse(<<'ENDC'); struct one8 { unsigned long a; }; struct two4 { unsigned int a; unsigned int b; }; ENDC my $data = "\x91\x34\x33\x90\x81\x32\x31\x80"; print hexdump($data); print "BigEndian\n"; my $u1 = $c->unpack('one8', $data); print Dumper $u1; my $u2 = $c->unpack('two4', $data); print Dumper $u2; print "LittleEndian\n"; $c->tag('one8', ByteOrder => 'LittleEndian'); $c->tag('two4', ByteOrder => 'LittleEndian'); my $u3 = $c->unpack('one8', $data); print Dumper $u3; my $u4 = $c->unpack('two4', $data); print Dumper $u4;




    I am trying to improve my English skills, if you see a mistake please feel free to reply or /msg me a correction