Codepoint U+00A4 --> hex 0xA4 --> binary 10100100 We need to store 10100100 in the UTF-8 bytes: 110..... 10..... We distribute 10100100 over the 'points' in the two bytes: 110 00010 10 100100 So U+00A4 in UTF-8 becomes 1100010 10100100 or 0xc2 0xa4. #### sprintf("%c%c", # Build first byte by OR'ing 0xc0 (binary 11000000) with # the two highest order bits of the character (0xc0 | ($o >> 6)), # Build the second byte by OR'ing 0x80 (binary 10000000) # with the lower 6 bits of the character (obtained by # AND'ing with 0x3f, 00011111) (0x80 | ($o & 0x3f))