Problems? Is your data what you think it is? | |
PerlMonks |
Looking for ideas: Converting a binary 'flags' field into something human readableby Monk::Thomas (Friar) |
on Jul 07, 2015 at 21:17 UTC ( [id://1133633]=perlquestion: print w/replies, xml ) | Need Help?? |
Monk::Thomas has asked for the wisdom of the Perl Monks concerning the following question: Hello fellow monks I wrote a parser library for a specific class of binary files (resource files for a video game). It converts the file into a human readable data structure. (hashes of hashes of array of hashes kinda thing; data fields that are only relevant for parsing the binary data streams are stripped from the result) One of the data types it must be able to handle are 'flags' - a variable length sequence of bytes, where the actual value is uninteresting, the interesting part is whether a certain bit (flag) is set or not, e.g if a record is deleted or compressed or has a certain property. It seems like they are mostly exactly 1, 2, 4 or 8 bytes long, so I could easily use an unsigned integer value. However there are 2 things that bug me:
My ideas: One could emulate a '6 Byte Flags' field by reading uint32 + uint 16 and then manually calculate the combined integer value. Did anyone say kludge/wart? Yeah. Looks like one. Other representations I can think of could be 1110111100001 (which could get _extremely_ long) or a hash like:
(unknown bits with value 0 are not listed in order to conserve space) Your ideas? Thanks for all your input! I have a bit of a trouble deciding whether I should go with or with flags => 0b00100010000... # | is_deleted # | is_compressed because both are quite nice. I'm going to try both and see what works best. =) Regarding the parser grammar it becomes obvious that I need a custom data type for flag-fields. Maybe something like:
context The parser must be able to parse about 120 different 'records'. Since I don't want to hardcode all the different formats the parser is configurable by a YAML-file. A full record description is probably kinda boring, so here is the hex dump for a value, the parser grammar and the actual parsed data: hex dump:4B 53 49 5A 04 00 03 00 00 00 4B 57 44 41 0C 00 98 37 01 00 95 37 01 00 6C 2A 09 00 annotated hex dump:4B 53 49 5A Type (KSIZ) 04 00 Size (always 4) 03 00 00 00 KwrdCount 4B 57 44 41 Type (KWDA) 0C 00 Size (4 * KwrdCount) 98 37 01 00 95 37 01 00 6C 2A 09 00 Keywords FormID{count}parser grammar:
combining hex dump + grammar results in: ... example => { Keywords => [ '98 37 01 00', '95 37 01 00', '6C 2A 09 00' ], } ...(The output is a bit fudged, because Keywords => [] would actually contain the integer values. But then there would be nothing left resembling the original data, so I left the raw hex dump values. How to read the parser grammar:
not shown: sub records, alternatives, repeating records, ... I'm pretty sure this library will end up on CPAN some day, for now I want to keep it private to be able to modify the API (and break backwards compatibility) at will. (And defer finding a suitable name until it's ready for submitting. Current name is File::Parse)
Back to
Seekers of Perl Wisdom
|
|