Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^5: UTF8 versus \w in pattern matching (basic test)

by jo37 (Deacon)
on Jul 06, 2021 at 16:18 UTC ( [id://11134712]=note: print w/replies, xml ) Need Help??


in reply to Re^4: UTF8 versus \w in pattern matching (basic test)
in thread UTF8 versus \w in pattern matching

The Dumper output shows an encoding in ISO 8859-1, not UTF-8. That's strange.

Greetings,
-jo

$gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$

Replies are listed 'Best First'.
Re^6: UTF8 versus \w in pattern matching (basic test)
by haj (Vicar) on Jul 06, 2021 at 17:54 UTC

    That's not strange. You're seeing Unicode codepoints, which for the characters in question happen to be identical to their ISO-8859-1 encodings. Add "\N{EURO SIGN}" to the string and you get "\x{20ac}": That's again the codepoint and no UTF-8 encoding.

    "Everything is UTF-8" is one of the most frequent false assumptions I encounter when dealing with non-ASCII characters.

      Thanks for the clarification.

      Greetings,
      -jo

      $gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$
Re^6: UTF8 versus \w in pattern matching (basic test)
by ikegami (Patriarch) on Jul 06, 2021 at 21:07 UTC

    You didn't tell Perl to encode the output, so it didn't. The chars are being output unencoded. For example, a character with a value of E9 is output as E9. You are mistaking this lack of encoding for encoding using iso-8859-1.

    Seeking work! You can reach me at ikegami@adaelis.com

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11134712]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (3)
As of 2024-04-19 17:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found