in reply to Losing Bits with Pack/Unpack
I believe I can squeeze 3 8-bit ascii characters into a single 20-bit unicode character
No, you can't.
- ASCII is 7 bit, not 8 bit.
- Unicode defines code points from 0 to 0x10FFFF, i.e. 0x110000 code points. You need at least 21 bit for that (ln2(0x110000) = 20.087...), not 20 bit. Depending on the selected Unicode Transformation Format, you need up to 32 bit to encode those code points (see UTF-8 and UTF-16). Especially note that not all 32-bit combinations are valid Unicode.
- Three 7-bit characters need 21 bits, not 20 bits.
- Three 8-bit characters need 24 bits, not 20 bits.
If you want to store more bits in a limited storage area than that storage area allows, you need compression, either lossy or lossless. Just shifting bits around won't help.
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
|
---|
In Section
Seekers of Perl Wisdom