Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Mysteries of unpack("a", ...)

by pspinler (Novice)
on Jan 03, 2009 at 05:31 UTC ( #733866=perlquestion: print w/replies, xml ) Need Help??

pspinler has asked for the wisdom of the Perl Monks concerning the following question:

Hi:

Quick summary: what does unpack ("a4",...) do in comparison to unpack ("L", ...) ?

I have data from a foreign system (IBM z/VM performance history log data) that comes in 1468 byte records, with a mix of EBCDIC encoded characters and numbers in binary format (mostly IBM 390 E format 4 byte floats).

I'm using the following to read records,

binmode (STDIN); local $/ = undef; while (read (STDIN, $record, 1468)) { my $parsed = &decode_record ($record); print_record ($parsed); }

my sub decode_record has lots of tidbits like this (multiple calls to unpack() only for my own clarity, until I get this reliably working)

sub decode_record ($) { my $record = shift; my %rec; $rec{"date"} = unpack ("a8", $record); $rec{"time"} = unpack ("x8a8", $record); (snip) $rec{"el_time"} = unpack ("x48a4", $record); $rec{"samples"} = unpack ("x52a4", $record); (snip) return \%rec; }

My problem is I'm having problems dealing with those a4 fields I'm unpack()ing, each of them being one of those IBM E format 4 byte floats I mentioned. I'm calling this routine to attempt to parse 'em:

sub parse_E ($) { my $data = shift; my ($sign, $characteristic, $fraction); $sign = ($data & 0x80000000) ? -1 : 1; $characteristic = (($data >> 24) & 0x7f) - 64; $fraction = (($data & 0x00ffffff) / 0xffffff) * 16; my $num = $sign * $fraction ** $characteristic; printf("DEBUG: parse_E(%32s)\n\tsign: %d charisteristic: %d ". "fraction: %f = %f\n", unpack ("B32", $data), $sign, $characteristic, $fraction, $num); printf("DEBUG: unpacked characteristic %s\n", unpack ("B7", ($data >> 24) & 0x7f)); printf("DEBUG: unpacked fraction %7s%s\n", " ", unpack ("B24", $data & 0x00ffffff)); return $num; }

The thing is, I'm getting results like this, which indicates that I don't know what the floop unpack("a4") does. In particular, notice the error messages "isn't numeric" and also the debugging bitstring prints from my bit twiddling, which should result in 7 bits of data, bits 30-25, and and 24 bits of data, bits 23-0. Instead I appear to be getting 7 bits and 8 bits, and they don't appear to match the passed in bitstring in any way.

Argument "B<\0\0" isn't numeric in bitwise and (&) at ./testparse.pl l +ine 47. DEBUG: parse_E(01000010001111000000000000000000) sign: 1 charisteristic: -64 fraction: 0.000000 = -inf DEBUG: unpacked characteristic 0011000 DEBUG: unpacked fraction 00110000

The 'line 47' in that error message happens to be the first binary operation on the data, '$data & 0x80000000'.

But, if I change unpack ("a4", ...) to unpack ("L"), then I get this, instead:

DEBUG: parse_E(00110001001100010011000100110001) sign: 1 charisteristic: 2 fraction: 3.750000 = 14.062502 DEBUG: unpacked characteristic 0011011 DEBUG: unpacked fraction 001100110011100100110011

I'm still doing something wrong here, since my 7bit and 24 bit bitstrings still don't match bits 30-24 and bits 23-0 in the raw data, but suddenly I stop getting the "not numeric" error message and actually see the proper length of bitstrings if not the proper data.

Would some kind soul please enlighten my stumblings?

Thanks!

-- Pat

Replies are listed 'Best First'.
Re: Mysteries of unpack("a", ...)
by BrowserUk (Patriarch) on Jan 03, 2009 at 06:15 UTC

    Does this look right for the value in your example?

    $n = unpack 'N', pack 'B32', '01000010001111000000000000000000';; ( $s, $c, $e ) = ( $n & 0x8000_0000, (( $n >> 24 ) & 0x7f) - 64, (($n & 0x00ffffff) / 0xffffff) * 16 );; print $s, $c, $e;; 0 2 3.75000022351743 $num = ($s? -1 : 1 ) * $e * 10**$c;; print $num;;

    375.000022351743

    If so, then the following modification of your subroutine may be what you need:

    sub parse_E ($) { my $data = shift; ## The data comes in as a 4-byte string so... $data = unpack 'N', $data; ## We need to treat it as an unsigned i +nteger ## in order to do bitwise math on it my ($sign, $characteristic, $fraction); $sign = ($data & 0x80000000) ? -1 : 1; $characteristic = (($data >> 24) & 0x7f) - 64; $fraction = (($data & 0x00ffffff) / 0xffffff) * 16; ## Not the fraction raised to the power of the characteristic ## my $num = $sign * $fraction ** $characteristic; ## But rather, the fraction * 10 to the power of the characteristic my $num = $sign * $fraction * 10 ** $characteristic; printf("DEBUG: parse_E(%32s)\n\tsign: %d charisteristic: %d ". "fraction: %f = %f\n", unpack ("B32", $data), $sign, $characteristic, $fraction, $num); printf("DEBUG: unpacked characteristic %s\n", unpack ("B7", ($data >> 24) & 0x7f)); printf("DEBUG: unpacked fraction %7s%s\n", " ", unpack ("B24", $data & 0x00ffffff)); return $num; }

    Some inferences drawn from here, though it could definitely be more clearly stated.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Yah, that looks right. It didn't occur to me to unpack something I've already unpacked. I ended up grabbing substr's of an unpacked bitstring of my original data, in a rather ugly form.

      And yes, I also had to correct my math. I'll change it back to (presumably more effecient) bit fiddling tomorrow. For now, here's what I've ended up with:

      sub parse_E ($) { my $data = shift; my $databs = unpack ("B32", $data); my $sign = oct ("0b" . substr ($databs, 0, 1)) ? -1 : 1; my $characteristic = oct ("0b" . substr ($databs, 1, 7)) - 64; my $exponent = 16 ** $characteristic; my $fraction = oct ("0b" . substr ($databs, 8, 24)) / 0xfffff +f; my $num = $sign * $fraction * $exponent; return $num; }

      Thanks much!

      -- Pat

        I don't know why you're dividing the fraction part by 0xFF_FFFF, if you want to do it that way you'd surely want to divide by 0x100_0000.

        The following:

        my $f = unpack('N', pack('B32', "01000010001111000000000000000000")) +; printf "0x%08X = %12.9f\n", $f, conv_HFP($f) ; sub conv_HFP { my ($f) = @_ ; my $s = ($f & 0x8000_0000) ? -1 : +1 ; my $e = (((($f >> 24) & 0x7F) - 0x40) * 4) - 24 ; return $s * ($f & 0xFF_FFFF) * (2 ** $e) ; } ;
        will convert 4 byte 360/370 style radix 16 floats (where b31 is sign, b30..b24 is exponent biased by 0x40, and b23..0 is the fraction with binary point to the left of b23). Using the one example I can see the result is:
          0x423C0000 = 60.000000000
        
        From a quick poke around, it appears that floats are stored big-endian.

        I note that you say these are "IBM 390 E format 4 byte floats". The ESA/390 supports both the old 360/370 HFP (hex floating point) and new newer BFP (binary floating point) which conforms to IEEE 754-1985. Converting BFP can be done so:

        my $f = unpack('N', pack('B32', "01000010001111000000000000000000")) +; printf "0x%08X = %12.9f\n", $f, conv_BFP($f) ; sub conv_BFP { my ($f) = @_ ; my $s = ($f & 0x8000_0000) ? -1 : +1 ; my $e = ((($f >> 23) & 0xFF) - 0x7F) - 23 ; return $s * (($f & 0x7F_FFFF) | 0x80_0000) * (2 ** $e) ; } ;
        in the unlikely event your native floating point is not IEEE 754-1985 !! (Otherwise unpack('f', ...), with suitable care over the byte ordering !)

Re: Mysteries of unpack("a", ...)
by ikegami (Patriarch) on Jan 03, 2009 at 06:27 UTC

    which indicates that I don't know what the floop unpack("a4") does.

    substr($_, 0, 4)
    or rather
    substr(do { _utf8_off(my $internal = $_); $internal }, 0, 4)

      Or rather

      substr substr(do { _utf8_off(my $internal = $_); $internal }, 0, 4) . chr(0) x 4, 0, 4;

      How's that! (for out-pedanting the pedant! (Always assuming I got it right :)

      From perlfunc:pack (I'm too tired to fix up the link):

      a A string with arbitrary binary data, will be null padded.

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        That's wrong.

        While pack (whose documentation you quoted) will add NULs, unpack doesn't.

        >perl -le"print length unpack 'a4', ''" 0

        If anything, unpack should do the opposite (remove trailing NULs), but it doesn't do that either.

        >perl -le"print length unpack 'a4', qq{\0\0\0\0}" 4
Re: Mysteries of unpack("a", ...)
by Marshall (Canon) on Jan 03, 2009 at 08:10 UTC
    1) I am not at all sure that STDIN can even be opened in binary mode at all! I mean CTL-C, CTL-Z mean things to STDIN although these are certainly valid binary values.

    2)If you want to read binary data, open a file in binmode... local $/ = undef; is not needed.

    3)Once you are dealing with binary data, study page 758 of "Programming Perl, 3rd edition" very carefully. One thing to be aware of is for example: "big-endian" vs "little-endian" order. Some machines put the most significant 16 bits first of 32 bits and some put it vice-versa. IBM has many machines and some do it one way and some another.

    In your situation, look at nN and vV and the other options.

    Perl can do binary editing quite well.

    I offer one of the very first Perl subs that I wrote. This is from more than a decade ago. I would do some things differently now. But this does slam multiple Windows .wav files of the same type together into a new .wav file. This is just a simple example of binary editing.

      I am not at all sure that STDIN can even be opened in binary mode at all!

      Yes it works the same on STDIN as other handles. Specifically, it disables crlf→lf conversion on Windows machines, it stops treating chr(26) as the end of file on non-PerlIO Windows builds, and does nothing elsewhere.

      I mean CTL-C, CTL-Z mean things to STDIN although these are certainly valid binary values.

      No they don't. They may mean something to the tty/console, but STDIN doesn't even know about the Ctrl key. It doesn't treat character 3 or 26 specially.

      >perl -e"print qq{\x03\x1A}" | perl -le"print uc unpack 'H*', <STDIN>" 031A $perl -e'print qq{\x03\x1A}' | perl -le'print uc unpack "H*", <STDIN>' 031A

      If you want to read binary data, open a file in binmode... local $/ = undef; is not needed.

      Not true at all. $/ is quite useful on binary files.

      my @records = map parse_rec($_), map /(.{$RECSIZE})/sg, do { local $/; <$fh> };

      and

      my @records; local $/ = \$RECSIZE; local *_; while (<$fh>) { push @records, parse_rec($_); }

      are equivalent to

      my @records; local *_; while (read($fh, $_, $RECSIZE)) { push @records, parse_rec($rec); }

      Mind you, read is unaffected by $/, but that has nothing to do with whether the file is binary or not.

        I use STDIN for command line filters of "catable files" (text), eg. cat or "type" in the Windows world can display those files. I stand corrected about use of a binary file for such a purpose.

        I am curious as to what "*_" means? I couldn't find that in my reference books.

        The kind of binary files I usually deal with might have an odd number of bytes and I have to fix it up in the final result with either 16 bit aligned or 32 bit aligned values. Sometimes that means shifting things over a byte or more, So something like:
        my $n_bytes = read(INBIN, $buff, $BUFSIZE); is the ticket. Your mileage may vary as they say! I haven't written any really hairy binary stuff in Perl.

Re: Mysteries of unpack("a", ...)
by djp (Hermit) on Jan 05, 2009 at 05:40 UTC
    You might try Convert::IBM390 which has probably done all the heavy lifting for you.
        Oops my bad!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://733866]
Approved by kyle
Front-paged by McDarren
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2023-12-06 12:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What's your preferred 'use VERSION' for new CPAN modules in 2023?











    Results (30 votes). Check out past polls.

    Notices?