http://qs321.pair.com?node_id=11121900


in reply to Re: Losing Bits with Pack/Unpack
in thread Losing Bits with Pack/Unpack

ASCII has 30 control characters and four whitespace characters (SPACE, TAB, CR and LF). If you forgo support for control characters, TAB and CR (but keep space and LF), you end up with 0x60 characters. This isn't a power of 2 (which would help make things simple and very efficient), but it's still a nice number (3/4 of 2^7).

That would require an address space of 0x60^3 = 884,736 (0xD_8000) code points. That's a fair bit smaller than the 1,114,112 (0x11_0000) code points Unicode supports.

Of those, some are best avoided. I would avoid at least the following:

That's only 2,341 and we have a buffer of 229,376. Golden!

Mapping the 3 ASCII characters (with the limitations mentioned above) unto only "safe" characters won't be nice and easy, but it is doable.