http://qs321.pair.com?node_id=1192862

hexcoder has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I am struggling to understand the more advanced (un)pack templates, which allow me to describe my data structure.

I have this structure packed in a string $testinput:
  • byte: number of members in each of the following arrays
  • array of bytes
  • array of unsigned shorts
  • another array of unsigned shorts
  • I would like to know, if there is a single unpack template, that uses the first entry not only for sizing the following array, but instead three times for each of them.

    This is what I have tried so far, and I am not really happy with it.
    use strict; use warnings; my $testinput = pack('C/a* a* a*', (pack 'C*', 1, 2), (pack 'v*', 3, 4), (pack 'v*', 5, 6)); print join(',', unpack('C/C* v2 v2', $testinput)), "\n"; # gives "1,2,3,4,5,6" which is ok, # but has the repeat factors for 'v' hardcoded my $repeat = unpack('C', $testinput); print join(',', unpack("C/C* v$repeat v$repeat", $testinput)), "\n"; # gives "1,2,3,4,5,6" which is ok, but uses two steps
    Is it possible to use one call to unpack to expand this string to the values above?

    Thanks in advance, hexcoder

    Replies are listed 'Best First'.
    Re: pack and unpack multiple arrays with one common repeat prefix
    by Eily (Monsignor) on Jun 15, 2017 at 15:36 UTC

      I think your last proposition (first unpacking the count, then generating a template using it) is probably your best option, most readable.

      You might be able to achieve what you want with something like the code below, but it really lacks clarity when you want to read something longer than a byte:

      use strict; use warnings; use Data::Dump qw( pp ); my $str = pack "C (a)*", 4, 'a'..'z'; pp $str; pp unpack 'C/a @0 CXC/x/a @0 CXCXC/x/x/a', $str;
      "\4abcdefghijklmnopqrstuvwxyz" ("abcd", "efgh", "ijkl")

      • You already know what C/a does. => the stack is ("abcd") and pos = 5
      • @0 goes back to the start. => the stack is ("abcd") and pos = 0
      • CXC reads the first value, goes back one byte, and reads it again => the current list is ("abcd", 4, 4) and pos = 1
      • /x removes the last value in the list (4) and uses it as a repeat count for x, which just skips a byte (so /x ignores four bytes) => the list is ("abcd", 4) and pos = 5
      • /a removes the last value in the list (4 again), and uses it as a repeat count for a => the list is ("abcd", "efgh") and pos = 9
      • @0 goes back the start, CXCXC reads the count three times, /x/x uses the count twice to skip bytes, and /a reads the last substring.

      This gets harder when you want to skip more than one byte at a time. This can be done with x[V] which means "skip as many bytes as there are in V". But you can't write C/x[V] because the x would have two counts (one explicit, and one from the stack). So you have to write C/(x[V]) where you apply C times the group "skip as many bytes as in V".
      However, at least in my version of perl, you can't use: CXC /(x[a]) /a because skipping a group seems to change the stack and add an element that is used instead of the correct value. It kind of looks like after C/a CXC /(x[a]) the stack is actually ("abcd", 4, ()) instead of ("abcd", 4). That extra element is removed if you add "xX" which skips one byte forward, and skips one byte backward. This doesn't do anything useful, except the next /a does what you want.

      This leads to : pp unpack 'C/a @0 CXC /(x[a]) xX /a @0 CXC /((x[a])2) xX /a', $str;
      Once again, your last proposition is probably a better solution.

        Thanks for the very detailed explanation!

        I never understood the '@'- and 'x'- examples from the very tight 'packed' pack() description before.

        I agree, that my last template is more readable, but my table driven decoder is just too dumb to do it in two steps...

          Speaking of the solution in two steps, it can be written like this:

          $count = unpack "C", $str; @values = unpack "x C$count (V$count)2"

          my table driven decoder is just too dumb to do it in two steps...
          I have no idea what you mean by "table driven decoder", but maybe we can help you make it work in two steps instead? Or is there some code that you can't modify that can just be fed a pack template and nothing else?

          I suppose you can't change the way the data is packed either? Because if you pack only V values, you just have: my (@values) = unpack "C/(V3)", $str;

    Re: pack and unpack multiple arrays with one common repeat prefix
    by kennethk (Abbot) on Jun 15, 2017 at 14:57 UTC
      For this particular case, you don't need a rep count because the whole thing is very well behaved.
      print join(',', unpack("C/C* v*", $testinput)), "\n";
      Is this strictly an educational exercise, or do you have more complex data structures to interrogate? If the data is chunked in a different way, Counting Repetitions solves your trouble with parentheses, but this only works in the context where you get to pick how things are packed.
      my $testinput = pack('C/a* a* a*', (pack 'Cvs', 1, 3, -5), (pack 'Cvs', 2, 4, -6) ); print join(',', unpack("C/(Cvs)", $testinput)), "\n";

      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

        Thanks for the link! I did not known about the pack tutorial before.

        The context for this problem is a table driven telegram decoder, which assumes, it can split the packed telegram data structure into fields in one unpacking step.

        The other structures have been simple enough but the last extension has been more complex with its prefixed repeat count.

        hexcoder
          So, for clarity, does the following always hold?
          • byte: number of members in each of the following arrays
          • array of bytes
          • array of unsigned shorts
          • another array of unsigned shorts
          Because, from the Perl perspective, that's functionally equivalent to:
          • byte: number of members in array of bytes
          • array of bytes
          • array of unsigned shorts
          and thus C/C* v* does everything you need.

          #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

        First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

        That sounds like a really bad idea. That is using a computer to do it faster, not better.

          If you are using different algorithms in your coding than in your life, one of the two is sub-optimal. If you know a better way to do it, why are you wasting precious seconds you could be sitting under a tree?

          #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.