in reply to Pack/Unpack Tutorial (aka How the System Stores Data)

I have a few comments and I'll just leave them here in no particular order:

  1. I wish you would have defined "word" prior to using it willy-nilly. It's jargon that your tutorial's audience it's likely to be familiar with. From WordNet: a word is a string of bits stored in computer memory; "large computers use words up to 64 bits long".

  2. Bytes are almost always eight bits though that's not a universal constant. Perhaps it's infrequent enough that I didn't even need to mention but this one always gets my goat.

  3. Your use of "most" and "least" significant byte was also jargon. If you assume the value 0x12345678 then the most significant byte has the value 0x12 and the least significant has the value 0x78. From there the point on differently endian machines is just which order you start with when transcribing bytes.

  4. Your use of memory addresses is obfuscatory. This is better written as "Byte 0, byte 1, byte 2, byte 3". The only point at which a perl programmer cares about memory addresses is when doing non-perl programming or with the 'p' or 'P' format options. The point here is to indicate an order to the bytes in memory - that byte 0 might be located at a memory address 1000 is entire beside the point.

  5. White space is allowed without consequence in an unpack/pack format. It's just ignored except when it's a fatal error. I haven't nailed it down but some uses of whitespace just don't parse. That may be a bug but it's worth noting. This just means that in general people should use whitespace in a format to enhance readability - it doesn't affect it's operation.

  6. I've never been clear on the bit order within a byte - can you expand on that? I used to think that the differently endian machines also shuffled the bit order around as well. At this point I'm just confused.

  • Comment on Re: Pack/Unpack Tutorial (aka How the System Stores Data)

Replies are listed 'Best First'.
Re^2: Pack/Unpack Tutorial ("bit order")
by tye (Sage) on Jun 08, 2005 at 02:26 UTC
    I've never been clear on the bit order within a byte - can you expand on that? I used to think that the differently endian machines also shuffled the bit order around as well. At this point I'm just confused.

    In many cases, you can ignore bit order since bits do not have separate memory addresses. "Byte order" matters because you can access a byte as either the "first" byte (lowest address) in a "string" of bytes or as the "high" or "low" byte in a multi-byte numeric value. If you don't try to do both, then "byte order" doesn't matter, but using pack or unpack often means you are looking at bytes both ways. But, there is no "first" bit in packed data.

    If you have a text format ("unpacked" string) that shows bits (or hexidecimal nybbles or octal "digits") in a specific order, then you may have to worry about "bit order" (or other sub-byte order) if you've got something not using the near-universal "most significant digit first" ordering that is used when writing numbers in any base. Of course, pack and unpack (quite unfortunately) make a mess of this, as noted in Re^2: pack/unpack 6-bit fields. (precision) and (tye)Re: Ascending vs descending bit order.

    Put another way, "byte order" is usually used in reference to a detail of a computer's design and "bit order" doesn't matter in this context. However, both "bit order" and "byte order" can be applied to text representations of data (or even other "unpacked" representations where bits from within a byte get encoded into multiple bytes/characters of some other representation).

    - tye        

Re^2: Pack/Unpack Tutorial (aka How the System Stores Data)
by Anonymous Monk on May 15, 2014 at 14:34 UTC

    I acknowledge that my comments here come over a decade later, but still, ought to be made. This article was great, and I found it useful today (2014) for some work I am doing while bit-banging with perl. I found this article as a result of a very specific search with Google for what I am trying to do. These comments are my reply to the comments made just above here. In numbered order....

    1. I think that more people will know what "word" means, than what "willy-nilly" means. Remember the audience here: people wanting to use the pack/unpack functions.

    2. Bytes are almost eight bits, since the word "byte" is a contraction of "by eight", as in describing hardware design of memory. The context was while saying that there are a hundred or a thousand or a zillion memory addresses, BY EIGHT bits wide. In the old, old days, like magnetic core, there was a single bit of memory per location, or per cell. As things progressed, it was common to bring out a 'parallel' load or store, by eight bits. So yeah, eight bits.

    3. Captain obvious here... but this is exactly the point he was making. The most significant byte is often placed at the 'opposite' end in some systems compared to others. It's still the most significant but not always in the location where you would find the most significant byte.

    4. Obfuscatory unless someone is trying to read or write a memory-mapped location in memory, very typical of someone using perl to do this. Picking an arbitrary starting point like 0x1000 is better than starting at "byte 0", which implies that it has some special significance. It doesn't.

    5. White space in contrast to what, maybe a zero-fill? It's called white space because it doesn't show up on paper. Nulls, on the other hand, are often used to indicate end-of-string, which is something very different. Whitespace is printable; NUL is not.

    6. See #2 above for some clarification, although this is beyond the scope of this pack/unpack tutorial.

    Thank you again to the original author -- this was just the refresher I needed to use these awesome functions of perl.

    2018-07-08 Athanasius added paragraph tags