Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
If I ignore your text and look at your code, it appears you are trying to reformat each 2.5 8-bit characters as one 20-bit code point in order to reduce the number of 'characters' in your string. This would be valid if all 20-bit code points were valid. Your example does work when you get the details right. Your twelve character string is stored as a buffer of five code points.
use strict; use warnings; my $text='Hello World!'; my $hex_text = unpack 'H*', $text; my @code_points; while ($hex_text) { my $hex_num = substr($hex_text, 0, 5, ''); push @code_points, hex(sprintf '%05s',$hex_num); } my $buffer = pack '(U)*', @code_points; my @_code_points = unpack('(U)*', $buffer); my $_hex_text = sprintf '%X' x scalar(@_code_points), @_code_points; my $_text = pack 'H*', $_hex_text; print $_text;

UPDATE - Added improved code (with testing)

use strict; use warnings; use Encode qw(decode); use Test::More tests=>2; my $text='Hello World!'; my $buffer = pack '(U)*', # Convert to Unicode map {hex($_)} # Convert to decimal unpack '(a5)*', # Groups of 5 unpack 'H*', $text; # Convert to hex my $num_uni_chars = length(decode('UTF-8', $buffer)); is( $num_uni_chars, int(length($text)/2.5 + .5), 'Number of Unicode characters'); my $_text = pack 'H*', # Convert pairs of hex to ascii sprintf '%X' x $num_uni_chars, # Convert to hex and join unpack('(U)*', $buffer); # Decimal code points is($_text, $text, 'Restored text');

OUTPUT:

1..2 ok 1 - Number of Unicode characters ok 2 - Restored text
Bill

In reply to Re: Losing Bits with Pack/Unpack by BillKSmith
in thread Losing Bits with Pack/Unpack by o0lit3

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (2)
As of 2024-04-25 19:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found