Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

ASCII encoded unicode strings on web, such as \u00F3

by igoryonya (Pilgrim)
on Jul 12, 2015 at 02:12 UTC ( [id://1134341]=perlquestion: print w/replies, xml ) Need Help??

igoryonya has asked for the wisdom of the Perl Monks concerning the following question:

When I send a POST request to a certain url, I get the string like that:
Compruebe si las direcciones URL que encontr\u00e9 en el archivo de configuraci\u00f3n son v\u00e1lidos

How do I decode strings, such as:
\u00e9
\u00f3
\u00e1

I've tried different things:

#It gives weird result: $text =~ s/\\u(.{4})/pack('u', $1)/ge; $text =~ s/\\u(.{4})/pack('U', $1)/ge; #Says, nonnumeric argument: $text =~ s/\\u(.{4})/pack('u', "0x$1")/ge; $text =~ s/\\u(.{4})/pack('U', "0x$1")/ge; $text =~ s/\\u(.{4})/chr("0x$1")/ge; #Just stays as a changed string (\x{00e9}, \x{00f3}, \x{00e1}, #or with \0x{....}), doesn't interpolate: $text =~ s/\\u(.{4})/\\x{$1}/g; $text =~ s/\\u(.{4})/\\0x{$1}/g; #does not change anything: use Text::Unidecode; $text =~ s/\\u(.{4})/unidecode($1)/ge;

How to convert those encoded strings to utf8 chars?

Replies are listed 'Best First'.
Re: ASCII encoded unicode strings on web, such as \u00F3
by shmem (Chancellor) on Jul 12, 2015 at 10:25 UTC

    One way to do it:

    $_= 'Compruebe si las direcciones URL que encontr\u00e9 en el archivo +de configuraci\u00f3n son v\u00e1lidos'; s/\\u(\w{4})/eval "\"\\x{$1}\""/ge; print;
    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
      s/\\u(\w{4})/eval "\"\\x{$1}\""/ge;

      Really? String eval and \w?

      • You only want hex digits, not arbitary characters after \u.
      • To make a character from a number written in hexadecimal, convert the number to decimal using hex, then convert that number to a character using chr. No need to torture perl with a string eval.
      $_= 'Compruebe si las direcciones URL que encontr\u00e9 en el archivo +de configuraci\u00f3n son v\u00e1lidos'; s/\\u([0-9a-fA-F]{4})/chr hex $1/ge; print;

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        Really? String eval and \w?

        Yes. As stated, just one way to do it: \u0f00 => "\x{0f00}" => ༀ

        No need to torture perl with a string eval.

        Torture? String eval happens every time you use a module.

        Sometimes I just post TIMTOWTDI, since surely someone else will come up with a ( less odd | cleaner | more succinct | better | less costly ) way to do it. This time it has been you; Kudos ;-)

        perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
        So, I am curious, since chr hex is more efficient, then eval, is pack('U', hex) more efficient then chr hex or vice versa?
Re: ASCII encoded unicode strings on web, such as \u00F3
by Anonymous Monk on Jul 12, 2015 at 03:52 UTC
    If by 'utf8 chars' you mean Perl strings, try this:
    $text =~ s/ \\u ( \p{Hex}{4} ) / chr hex $1 /gex;
      What I mean is to convert ascii encoded unicode characters, as in my example, to normal unicode. For example (in bold are the ASCII encoded unicode chars): Compruebe si las direcciones URL que encontr\u00e9 en el archivo de configuraci\u00f3n son v\u00e1lidos

      I can't figure out how to convert those representations to normal characters.
      I tried your suggestion, it seems like, it just removes the following character :(
      Here is the result that I get, in comparison to the original string:

      Compruebe si las direcciones URL que encontr\u00e9 en el archivo de configuraci\u00f3n son v\u00e1lidos
      Compruebe si las direcciones URL que encontren el archivo de configuraci son vidos

      Here is what it shoud of been:

      Compruebe si las direcciones URL que encontré en el archivo de configuración son válidos
        Is your STDOUT in UTF-8 mode? (as in binmode STDOUT, ':encoding(utf-8), for example). That's the way to get 'normal Unicode' in Perl. Does this work for you:
        my $str = 'encontr\u00e9 configuraci\u00f3n v\u00e1lidod'; $str =~ s/ \\u ( \p{Hex}{4} ) / chr hex $1 /gex; binmode STDOUT, ':encoding(utf-8)'; print $str, "\n";
        ?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1134341]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2024-04-25 10:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found