Losing Bits with Pack/Unpack

by afoken (Canon)
Sep 17, 2020

in reply to Losing Bits with Pack/Unpack

I believe I can squeeze 3 8-bit ascii characters into a single 20-bit unicode character

No, you can't.

  • ASCII is 7 bit, not 8 bit.
  • Unicode defines code points from 0 to 0x10FFFF, i.e. 0x110000 code points. You need at least 21 bit for that (ln2(0x110000) = 20.087...), not 20 bit. Depending on the selected Unicode Transformation Format, you need up to 32 bit to encode those code points (see UTF-8 and UTF-16). Especially note that not all 32-bit combinations are valid Unicode.
  • Three 7-bit characters need 21 bits, not 20 bits.
  • Three 8-bit characters need 24 bits, not 20 bits.

If you want to store more bits in a limited storage area than that storage area allows, you need compression, either lossy or lossless. Just shifting bits around won't help.


Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Node Type: note
