Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^3: UTF8 versus \w in pattern matching (basic test)

by haj (Vicar)
on Jul 06, 2021 at 12:25 UTC ( [id://11134704]=note: print w/replies, xml ) Need Help??


in reply to Re^2: UTF8 versus \w in pattern matching (basic test)
in thread UTF8 versus \w in pattern matching

How do you fetch and process the file? Your original code example has no use utf8; and does not UTF-8-encode the output. You get your original string only because of a cancellation of errors:
  • Your file is UTF-8-encoded but you don't declare this to Perl. Perl reads the individual bytes of the UTF-8-encoding which are no word characters and thus won't match \w.
  • You just print the bytes. If you are using a UTF-8 terminal, this "works" because the terminal decodes your bytes.

Perl's default encoding is not UTF-8. If you read the file and decode it from UTF-8 you should be fine. If you fetch with LWP, you can either print $response->content (without encoding it) or encode $response->decoded_content before printing.

  • Comment on Re^3: UTF8 versus \w in pattern matching (basic test)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11134704]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2024-04-16 07:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found