pspinler has asked for the wisdom of the Perl Monks concerning the following question:
Hi:
Quick summary: what does unpack ("a4",...) do in comparison to unpack ("L", ...) ?
I have data from a foreign system (IBM z/VM performance history log data) that comes in 1468 byte records, with a mix of EBCDIC encoded characters and numbers in binary format (mostly IBM 390 E format 4 byte floats).
I'm using the following to read records,
binmode (STDIN);
local $/ = undef;
while (read (STDIN, $record, 1468)) {
my $parsed = &decode_record ($record);
print_record ($parsed);
}
my sub decode_record has lots of tidbits like this (multiple calls to unpack() only for my own clarity, until I get this reliably working)
sub decode_record ($) {
my $record = shift;
my %rec;
$rec{"date"} = unpack ("a8", $record);
$rec{"time"} = unpack ("x8a8", $record);
(snip)
$rec{"el_time"} = unpack ("x48a4", $record);
$rec{"samples"} = unpack ("x52a4", $record);
(snip)
return \%rec;
}
My problem is I'm having problems dealing with those a4 fields I'm unpack()ing, each of them being one of those IBM E format 4 byte floats I mentioned. I'm calling this routine to attempt to parse 'em:
sub parse_E ($) {
my $data = shift;
my ($sign, $characteristic, $fraction);
$sign = ($data & 0x80000000) ? -1 : 1;
$characteristic = (($data >> 24) & 0x7f) - 64;
$fraction = (($data & 0x00ffffff) / 0xffffff) * 16;
my $num = $sign * $fraction ** $characteristic;
printf("DEBUG: parse_E(%32s)\n\tsign: %d charisteristic: %d ".
"fraction: %f = %f\n",
unpack ("B32", $data), $sign, $characteristic,
$fraction, $num);
printf("DEBUG: unpacked characteristic %s\n",
unpack ("B7", ($data >> 24) & 0x7f));
printf("DEBUG: unpacked fraction %7s%s\n", " ",
unpack ("B24", $data & 0x00ffffff));
return $num;
}
The thing is, I'm getting results like this, which indicates that I don't know what the floop unpack("a4") does. In particular, notice the error messages "isn't numeric" and also the debugging bitstring prints from my bit twiddling, which should result in 7 bits of data, bits 30-25, and and 24 bits of data, bits 23-0. Instead I appear to be getting 7 bits and 8 bits, and they don't appear to match the passed in bitstring in any way.
Argument "B<\0\0" isn't numeric in bitwise and (&) at ./testparse.pl l
+ine 47.
DEBUG: parse_E(01000010001111000000000000000000)
sign: 1 charisteristic: -64 fraction: 0.000000 = -inf
DEBUG: unpacked characteristic 0011000
DEBUG: unpacked fraction 00110000
The 'line 47' in that error message happens to be the first binary operation on the data, '$data & 0x80000000'.
But, if I change unpack ("a4", ...) to unpack ("L"), then I get this, instead:
DEBUG: parse_E(00110001001100010011000100110001)
sign: 1 charisteristic: 2 fraction: 3.750000 = 14.062502
DEBUG: unpacked characteristic 0011011
DEBUG: unpacked fraction 001100110011100100110011
I'm still doing something wrong here, since my 7bit and 24 bit bitstrings still don't match bits 30-24 and bits 23-0 in the raw data, but suddenly I stop getting the "not numeric" error message and actually see the proper length of bitstrings if not the proper data.
Would some kind soul please enlighten my stumblings?
Thanks!
-- Pat
Re: Mysteries of unpack("a", ...)
by BrowserUk (Patriarch) on Jan 03, 2009 at 06:15 UTC
|
$n = unpack 'N', pack 'B32', '01000010001111000000000000000000';;
( $s, $c, $e ) = (
$n & 0x8000_0000,
(( $n >> 24 ) & 0x7f) - 64,
(($n & 0x00ffffff) / 0xffffff) * 16
);;
print $s, $c, $e;;
0 2 3.75000022351743
$num = ($s? -1 : 1 ) * $e * 10**$c;;
print $num;;
375.000022351743
If so, then the following modification of your subroutine may be what you need:
sub parse_E ($) {
my $data = shift; ## The data comes in as a 4-byte string so...
$data = unpack 'N', $data; ## We need to treat it as an unsigned i
+nteger
## in order to do bitwise math on it
my ($sign, $characteristic, $fraction);
$sign = ($data & 0x80000000) ? -1 : 1;
$characteristic = (($data >> 24) & 0x7f) - 64;
$fraction = (($data & 0x00ffffff) / 0xffffff) * 16;
## Not the fraction raised to the power of the characteristic
## my $num = $sign * $fraction ** $characteristic;
## But rather, the fraction * 10 to the power of the characteristic
my $num = $sign * $fraction * 10 ** $characteristic;
printf("DEBUG: parse_E(%32s)\n\tsign: %d charisteristic: %d ".
"fraction: %f = %f\n",
unpack ("B32", $data), $sign, $characteristic,
$fraction, $num);
printf("DEBUG: unpacked characteristic %s\n",
unpack ("B7", ($data >> 24) & 0x7f));
printf("DEBUG: unpacked fraction %7s%s\n", " ",
unpack ("B24", $data & 0x00ffffff));
return $num;
}
Some inferences drawn from here, though it could definitely be more clearly stated.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
Yah, that looks right. It didn't occur to me to unpack something I've already unpacked. I ended up grabbing substr's of an unpacked bitstring of my original data, in a rather ugly form.
And yes, I also had to correct my math. I'll change it back to (presumably more effecient) bit fiddling tomorrow. For now, here's what I've ended up with:
sub parse_E ($) {
my $data = shift;
my $databs = unpack ("B32", $data);
my $sign = oct ("0b" . substr ($databs, 0, 1)) ? -1 : 1;
my $characteristic = oct ("0b" . substr ($databs, 1, 7)) - 64;
my $exponent = 16 ** $characteristic;
my $fraction = oct ("0b" . substr ($databs, 8, 24)) / 0xfffff
+f;
my $num = $sign * $fraction * $exponent;
return $num;
}
Thanks much!
-- Pat
| [reply] [d/l] |
|
my $f = unpack('N', pack('B32', "01000010001111000000000000000000"))
+;
printf "0x%08X = %12.9f\n", $f, conv_HFP($f) ;
sub conv_HFP {
my ($f) = @_ ;
my $s = ($f & 0x8000_0000) ? -1 : +1 ;
my $e = (((($f >> 24) & 0x7F) - 0x40) * 4) - 24 ;
return $s * ($f & 0xFF_FFFF) * (2 ** $e) ;
} ;
will convert 4 byte 360/370 style radix 16 floats (where b31 is sign, b30..b24 is exponent biased by 0x40, and b23..0 is the fraction with binary point to the left of b23). Using the one example I can see the result is:
0x423C0000 = 60.000000000
From a quick poke around, it appears that floats are stored big-endian.
I note that you say these are "IBM 390 E format 4 byte floats". The ESA/390 supports both the old 360/370 HFP (hex floating point) and new newer BFP (binary floating point) which conforms to IEEE 754-1985. Converting BFP can be done so: my $f = unpack('N', pack('B32', "01000010001111000000000000000000"))
+;
printf "0x%08X = %12.9f\n", $f, conv_BFP($f) ;
sub conv_BFP {
my ($f) = @_ ;
my $s = ($f & 0x8000_0000) ? -1 : +1 ;
my $e = ((($f >> 23) & 0xFF) - 0x7F) - 23 ;
return $s * (($f & 0x7F_FFFF) | 0x80_0000) * (2 ** $e) ;
} ;
in the unlikely event your native floating point is not IEEE 754-1985 !! (Otherwise unpack('f', ...), with suitable care over the byte ordering !)
| [reply] [d/l] [select] |
Re: Mysteries of unpack("a", ...)
by ikegami (Patriarch) on Jan 03, 2009 at 06:27 UTC
|
substr($_, 0, 4)
or rather
substr(do { _utf8_off(my $internal = $_); $internal }, 0, 4)
| [reply] [d/l] [select] |
|
substr
substr(do { _utf8_off(my $internal = $_); $internal }, 0, 4)
. chr(0) x 4, 0, 4;
How's that! (for out-pedanting the pedant! (Always assuming I got it right :)
From perlfunc:pack (I'm too tired to fix up the link):
a A string with arbitrary binary data, will be null padded.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
>perl -le"print length unpack 'a4', ''"
0
If anything, unpack should do the opposite (remove trailing NULs), but it doesn't do that either.
>perl -le"print length unpack 'a4', qq{\0\0\0\0}"
4
| [reply] [d/l] [select] |
Re: Mysteries of unpack("a", ...)
by Marshall (Canon) on Jan 03, 2009 at 08:10 UTC
|
1) I am not at all sure that STDIN can even be opened in binary mode at all! I mean
CTL-C, CTL-Z mean things to STDIN although these are certainly valid binary values.
2)If you want to read binary data, open a file in binmode... local $/ = undef;
is not needed.
3)Once you are dealing with binary data, study page 758 of "Programming Perl, 3rd edition" very
carefully. One thing to be aware of is for example: "big-endian" vs "little-endian" order.
Some machines put the most significant 16 bits first of 32 bits and some put it vice-versa.
IBM has many machines and some do it one way and some another.
In your situation, look at nN and vV and the other options.
Perl can do binary editing quite well.
I offer one of the very first Perl subs that I wrote. This is from more than
a decade ago. I would do some things differently now. But this does slam multiple
Windows .wav files of the same type together into a new .wav file. This is just
a simple example of binary editing.
| [reply] [d/l] [select] |
|
I am not at all sure that STDIN can even be opened in binary mode at all!
Yes it works the same on STDIN as other handles. Specifically, it disables crlf→lf conversion on Windows machines, it stops treating chr(26) as the end of file on non-PerlIO Windows builds, and does nothing elsewhere.
I mean CTL-C, CTL-Z mean things to STDIN although these are certainly valid binary values.
No they don't. They may mean something to the tty/console, but STDIN doesn't even know about the Ctrl key. It doesn't treat character 3 or 26 specially.
>perl -e"print qq{\x03\x1A}" | perl -le"print uc unpack 'H*', <STDIN>"
031A
$perl -e'print qq{\x03\x1A}' | perl -le'print uc unpack "H*", <STDIN>'
031A
If you want to read binary data, open a file in binmode... local $/ = undef; is not needed.
Not true at all. $/ is quite useful on binary files.
my @records = map parse_rec($_),
map /(.{$RECSIZE})/sg,
do { local $/; <$fh> };
and
my @records;
local $/ = \$RECSIZE;
local *_;
while (<$fh>) {
push @records, parse_rec($_);
}
are equivalent to
my @records;
local *_;
while (read($fh, $_, $RECSIZE)) {
push @records, parse_rec($rec);
}
Mind you, read is unaffected by $/, but that has nothing to do with whether the file is binary or not.
| [reply] [d/l] [select] |
|
I use STDIN for command line filters of "catable files" (text), eg. cat or "type" in the
Windows world can display those files. I stand corrected about use of a binary
file for such a purpose.
I am curious as to what "*_" means? I couldn't find that in my reference books.
The kind of binary files I usually deal with might have
an odd number of bytes and I have to fix it up in the final
result with either 16 bit aligned or 32 bit aligned values. Sometimes that means shifting things over a byte or more, So something like:
my $n_bytes = read(INBIN, $buff, $BUFSIZE); is the ticket. Your mileage
may vary as they say! I haven't written any really hairy binary stuff in Perl.
| [reply] [d/l] |
|
Re: Mysteries of unpack("a", ...)
by djp (Hermit) on Jan 05, 2009 at 05:40 UTC
|
You might try Convert::IBM390 which has probably done all the heavy lifting for you. | [reply] |
|
| [reply] |
|
| [reply] |
|
|