Re: Losing Bits with Pack/Unpack

in reply to Losing Bits with Pack/Unpack

If I ignore your text and look at your code, it appears you are trying to reformat each 2.5 8-bit characters as one 20-bit code point in order to reduce the number of 'characters' in your string. This would be valid if all 20-bit code points were valid. Your example does work when you get the details right. Your twelve character string is stored as a buffer of five code points.

use strict;
use warnings;

my $text='Hello World!';

my $hex_text = unpack  'H*', $text;
my @code_points;
while ($hex_text) {
    my $hex_num = substr($hex_text, 0, 5, '');
    push @code_points, hex(sprintf '%05s',$hex_num);
}
my $buffer = pack '(U)*', @code_points;

my @_code_points = unpack('(U)*', $buffer);
my $_hex_text = sprintf '%X' x scalar(@_code_points), @_code_points;
my $_text = pack 'H*', $_hex_text;
print $_text;
[download]

UPDATE - Added improved code (with testing)

use strict;
use warnings;
use Encode qw(decode);
use Test::More tests=>2;
my $text='Hello World!';

my $buffer = pack '(U)*',           # Convert to Unicode
    map {hex($_)}                   # Convert to decimal
        unpack '(a5)*',             # Groups of 5 
            unpack  'H*', $text;    # Convert to hex

my $num_uni_chars = length(decode('UTF-8', $buffer));

is( $num_uni_chars, int(length($text)/2.5 + .5),
    'Number of Unicode characters');

my $_text = pack 'H*',              # Convert pairs of hex to ascii
    sprintf '%X' x $num_uni_chars,  # Convert to hex and join
        unpack('(U)*', $buffer);    # Decimal code points

is($_text, $text, 'Restored text');
[download]

OUTPUT:

1..2
ok 1 - Number of Unicode characters
ok 2 - Restored text
[download]

Bill

In Section Seekers of Perl Wisdom