Missing byte using unpack, pack, read(in terms of bytes)

joemaniaci has asked for the wisdom of the Perl Monks concerning the following question:

So I have been working on reverse engineering a file in hex that some genius thought it would be a great idea to write in both little endian and big endian. Anyway I came across what I believe to be a bug in perl but I wanted to make sure and so far I have my coworkers in league with me.

So I have these files that contain three sorts of records. There is the initial header that occurs only once, contains a totaly of 4096 bytes and is comprised of four byte floats and four byte int32s.

The second record is comprised of 36 bytes, also made up of four byte floats and four byte int32s. It appears to be precursor data to the third record type. It also has a variable that is used to state how many instances of the third record there should be. At which point another instance of this second record can follow as well as another set of third type records

The third record is comprised of a total of 28 bytes and contains(in order) five four-byte floats, one four-byte int32 and two two-byte shorts. The number of instances for this record is stored in record 2.

Now I have all of my unpacking/packing, swapping big to little endian, templates etc etc taken care of. My reading of a float for example is stored as a subroutine...

sub grabFloat
{
     my $FourBytes = 4;
     my $floatTemp = 'f';
     read(IN, my $record, $FourBytes);
     $record = reverse $record;
     my $value = unpack($floatTemp, $record);
     return $value;
}
[download]

My code for grabbing my short is ...

sub grabInt16
{
     my $TwoBytes = 2;
     my $Int16 = 's';
     read(IN, my $record, $TwoBytes);
     $record = reverse $record;
     my $value = unpack($Int16, $record);
     return $value;
}
[download]

Int32s are pretty much the same as float except they are different endian and don't need to be reversed.

So here is the potential bug. I am reading thousands upon thousands of these records with out error. Then for some reason perl decides to skip a single byte. So let's look at record three.

Let's say I am reading the 305th record of the third type, so I should have five floats, one int32 and two int16s. Let's also say that we are starting with offset 0x1000

So what should we expect?

@ 0x1000 we should read 4 bytes for the first float

@ 0x1004 we should read 4 bytes for the Second float

@ 0x1008 we should read 4 bytes for the third float

@ 0x1012 we should read 4 bytes for the fourth float

@ 0x1016 we should read 4 bytes for the fifth float

@ 0x1020 we should read 4 bytes for the only int32

@ 0x1024 we should read 4 bytes for the first int16

@ 0x1026 we should read 4 bytes for the second int16

However, this is not what happens! and I am losing my mind. Here is what goes down...

@ 0x1000 we should read 4 bytes for the first float

@ 0x1004 we should read 4 bytes for the Second float

@ 0x1008 we should read 4 bytes for the third float

@ 0x1012 we should read 4 bytes for the fourth float

Now my next(final) float should be stored between bytes 0x1016 and 0x1019. However, what happens is that byte 0x1016 is discarded/skipped. So the float is now read between 0x1017 and 0x1020!!!!!!! So now everything from this point forward is shifted a byte. As you could see from the code above, I only read an even number of bytes, 2 or 4. If I was off by two I would believe that I made a mistake somewhere and read an extra int16 somewhere, but it is only a single byte! Now I have tried this script on multiple versions of these files with the exact same behavior every time. It occurs at different places for each file, but is at a consistent location for each individual file.

All the files I am working with are classified military files so I can't share. So I hope I was descriptive enough to point someone in the right direction.

I have verified this behavior many times and in many ways and running up to this bug I can print out what it looks like and this is essentially what I get...

print: 1.0 2.0 3.0 4.0 5.0 25 1 0

print: 1.1 2.2 3.3 4.15 5.35 26 2 0

print: 1.2 2.4 3.6 4.25 5.53 25 2 0

print: 1.3 2.6 3.9 4.0 2.58e-044 -7923652397.....

Comment on Missing byte using unpack, pack, read(in terms of bytes) Select or Download Code

Replies are listed 'Best First'.
Re: Missing byte using unpack, pack, read(in terms of bytes) by BrowserUk (Patriarch) on Jul 23, 2012 at 23:12 UTC
On the basis of the code you have posted, you are making really hard work of parsing those files. Commensurate with refining the templates for records 1 & 2 which you haven't fully described, something like this would read the entire thing: `#! perl -slw use strict; use constant { RECORD_1 => 'f512 l512', RECORD_2 => 'f3 l3', RECORD_3 => 'ff ll s2', ## index of field in rec2 that contains count of type 3 records that f +ollow it COUNT => 3, }; open I, '<:raw', $ARGV[0] or die "$ARGV[0] { $!"; my @rec1 = unpack RECORD_1, do{ local $/ = \4096; <I> }; until( eof( I ) ) { my @rec2 = unpack RECORD_2, do{ local $/ = \36; <I> }; for ( 1 .. $rec2[ COUNT ] ) { my @rec3 = unpack RECORD_3, do{ local $/ = \28; <I> }; } } close I;` [download] Note also, that on recent versions of Perl (since 5.10), unpack can deal with little/big-endian issues for you. Say your type 2 records contain 2 big-endian floats; followed by 2 little-endian int32s and then 2 big-endian int32s: Use a template of:`'f>2 l<2 l>2'` With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l] [select]
Re: Missing byte using unpack, pack, read(in terms of bytes) by Corion (Patriarch) on Jul 23, 2012 at 22:10 UTC
You don't show how you open the file, and you don't tell us the platform you're using. If you are on Windows, my guess is that you forgot to use binmode on the file, and newline translation is hitting you. If that guess is wrong, then please show a short, self-contained program that reproduces the problem, and also tell us the version of Perl you're using.	[reply]
Re^2: Missing byte using unpack, pack, read(in terms of bytes) by joemaniaci (Sexton) on Jul 23, 2012 at 22:27 UTC
Windows 7 64-bit perl v5.12.3 I open the file using... `open(IN, $nameoffile) or die "Can't open $!\n";` I have not seen binmode so I am definitely not using that. I have my .pl file on a classified network not connected to the internet so I would have to retype the entire thing. I am almost positive it is not my code though since I can repeat the exact same time thousands of times before this issue pops up at some random point. EDIT: Looking at binmode, before I open the file, should it be... `binmode STDIN, ":bytes";` `open(IN, $nameoffile) or die "Can't open $!\n";` Or should it be... `binmode IN, ":bytes";` `open(IN, $nameoffile) or die "Can't open $!\n";` EDITEDIT: I just tried including binmode and the behavior didn't change whatsoever, unless both of my examples above are wrong. As far as I can tell these files contains a single massive block of hex data so I don't think I have to worry about any newlines.	[reply] [d/l] [select]
Re^3: Missing byte using unpack, pack, read(in terms of bytes) by RichardK (Parson) on Jul 23, 2012 at 22:46 UTC
You need to call binmode after you open the file `open(IN,'<','filename') or die "$!"; binmode IN;` [download]	[reply] [d/l]
Re^4: Missing byte using unpack, pack, read(in terms of bytes) by joemaniaci (Sexton) on Jul 23, 2012 at 22:50 UTC


good chemistry is complicated, and a little bit messy -LW
	PerlMonks