Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Behaviour of unpack() with the Z template

by johngg (Canon)
on Feb 07, 2017 at 16:52 UTC ( #1181314=perlquestion: print w/replies, xml ) Need Help??

johngg has asked for the wisdom of the Perl Monks concerning the following question:

From the documentation, pack'ing ...

When used with Z , a * as the repeat count is guaranteed to add a trailing null byte, so the resulting string is always one byte longer than the byte length of the item itself.

And unpack'ing (my boldened text) ...

When unpacking, A strips trailing whitespace and nulls, Z strips everything after the first null, and a returns data with no stripping at all.

Using the 'Z*' template to pack seems to work as described, adding a trailing null to the packed string. However, my reading of the documentation is that a single trailing null will be left in the unpack'ed string. This does not seem to happen.

johngg@shiraz:~/perl > perl -Mstrict -Mwarnings -E ' my $str = q{abc }; my $pck = pack q{Z*}, $str; say sprintf q{%02x}, ord for split m{}, $pck; say q{-} x 5; say sprintf q{%02x}, ord for split m{}, unpack q{Z*}, $pck;' 61 62 63 20 20 00 ----- 61 62 63 20 20

Testing all three templates shows that 'A' and 'a' behave as described.

johngg@shiraz:~/perl > perl -Mstrict -Mwarnings -E ' my $str = q{abc }; my $pck = pack q{a10}, $str; say qq{unpack A*:}; say sprintf q{ %02x}, ord for split m{}, unpack q{A*}, $pck; say qq{unpack Z*:}; say sprintf q{ %02x}, ord for split m{}, unpack q{Z*}, $pck; say qq{unpack a*:}; say sprintf q{ %02x}, ord for split m{}, unpack q{a*}, $pck;' unpack A*: 61 62 63 unpack Z*: 61 62 63 20 20 unpack a*: 61 62 63 20 20 00 00 00 00 00

I have tested this under perl 5.18.2 on Linux and 5.12.4 on Darwin, the documentation is consistent for those versions and the latest 5.24.0. The current behaviour of 'Z' actually suits what I am doing much better than leaving a trailing null so I'm hoping it is the documentation (or my reading of it) that is wrong.

Am I being obtuse or is the documentation out of kilter?

Cheers,

JohnGG

Replies are listed 'Best First'.
Re: Behaviour of unpack() with the Z template
by Eily (Monsignor) on Feb 07, 2017 at 18:55 UTC

    A little variation of your code (with a null byte inside the stream that is read by the first Z pattern):

    use v5.10; my $str = qq{abc\0def}; my $pck = pack q{Z*}, $str; say sprintf q{%02x}, ord for split m{}, $pck; say q{-} x 5; say join ",", unpack q{Z5Z*}, $pck; __DATA__ 61 62 63 00 64 65 66 00 ----- abc,ef
    The way I see it is that the Z5 pattern makes perl read the first five bytes and strip away everything after the first null byte (ie: the last byte with value 64 is stripped) to obtain the bytes (61, 62, 63, 00). Those four bytes are not yet a perl string but "A null-terminated (ASCIZ) string", where the null byte is part of the formatting, not the data. It is the translation from one format (null-terminated string) to another (perl string) that makes the null byte disappear, not the stripping.

    With that interpretation it's the fact that you can do unpack "Z4", "hello", where the obtained data is not a valid null-terminated string, that might be surprising, but that's just perl Doing What You Mean as it usually does.

      the Z5 pattern makes perl read the first five bytes and strip away everything after the first null byte

      I agree with your analysis but the word "after" here and in the documentation is where I see a problem. It implies that only text that follows the null will be stripped, not the null itself. The behaviour makes sense, the description is, I think, wrong.

      Cheers,

      JohnGG

        That was precisely my point, that the null byte is not ignored like the other characters after it. It's part of the data when encoded in the format "null-terminated string", but not part of the value. It doesn't appear in the output value because it's a format detail, the same way the byte order doesn't appear after decoding a little endian or big endian integer value.

        But that's nitpicking, this would make the documentation misleading rather than plain wrong, which should also be avoided in a documentation. I think your phrasing returns everything up to but not including the first null. is fine.

Re: Behaviour of unpack() with the Z template
by andal (Hermit) on Feb 08, 2017 at 07:04 UTC

    I think perl is simply trying to be "symmetric" in packing/unpacking. When packing, "Z" adds NUL to original data. Out of symmetry, when unpacking "Z" should remove that added NUL. The documentation is not clear about this aspect. So I would suggest to understand it as

    When unpacking, A strips trailing whitespace and nulls, Z strips everything from the first null to the end, and a returns data with no stripping at all.
    Maybe some day, someone will actually update the docs :) Or change the behavior of "Z*" to get data to first NUL only, which would allow stuff like
    $b = pack("Z*C", "test", 10) ($str, $byte) = unpack("Z*C", $b);
    Though changing of docs is more likely :)

    EDIT. I'm wrong. In fact the "Z*" already takes the data to the first NUL and does not grab the rest. So, the above example actually works. But the documentation is completely incorrect. It should read

    When unpacking, a grabs everything and returns unchanged, A grabs everything but strips from first null to the end and trailing spaces. Z stops after first null, the null is stripped from data.

    and a returns everything with no stripping at all.

      I agree that the actual behaviour is "symmetric" and that makes sense from a design point of view. I confirmed the behaviour of the test you ran

      johngg@shiraz:~/perl > perl -Mstrict -Mwarnings -E ' my $pck = pack q{Z*C}, q{test}, 0x41; my( $str, $ch ) = unpack q{Z*a}, $pck; say qq{$str, $ch};' test, A

      I don't think the documentation has a problem with the "A" or "a" descriptions but the "Z" one should be clarified. An alternative to my suggestion in the above might be Z returns everything up to but not including the first null.

      Cheers,

      JohnGG

        Yes, you right, looks like I've done tests for "A*" incorrectly.

Re: Behaviour of unpack() with the Z template
by ikegami (Patriarch) on Feb 08, 2017 at 17:20 UTC

    Z is meant to allow the conversion to and from C-style strings. As such, unpack 'Z*' should and does remove the NUL. As mentioned, tests support this. This is a documentation error. please file a bug report by running perlbug.

Re: Behaviour of unpack() with the Z template
by Anonymous Monk on Feb 07, 2017 at 18:11 UTC

      Thank you for the link. This line

      ['u', 'Z*', "foo\0bar \0", "foo"]

      defines the test for an unpack using the 'Z*' template and it would appear that a successful test would not leave a trailing null in the resultant string. I think this confirms that the expected behaviour is that the first null and everything to its right will be stripped. I think the use of "after" in Z strips everything after the first null is misleading and the wording should be clarified. Perhaps Z strips everything from the first null onwards would be a more accurate description.

      Cheers,

      JohnGG

      Thank you for the link. This line

      ['u', 'Z*', "foo\0bar \0", "foo"]

      defines the test for an unpack using the 'Z*' template and it would appear that a successful test would not leave a trailing null in the resultant string. I think this confirms that the expected behaviour is that the first null and everything to its right will be stripped. I think the use of "after" in Z strips everything after the first null is misleading and the wording should be clarified. Perhaps Z strips everything from the first null onwards would be a more accurate description.

      Cheers,

      JohnGG

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1181314]
Front-paged by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2022-05-23 21:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (82 votes). Check out past polls.

    Notices?