Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^5: Handling utf-8 characters when scraping

by haukex (Bishop)
on Dec 27, 2018 at 08:17 UTC ( #1227736=note: print w/replies, xml ) Need Help??


in reply to Re^4: Handling utf-8 characters when scraping
in thread Handling utf-8 characters when scraping

One way to verify what Perl is really storing internally is Devel::Peek. What I would look for is that the UTF8 flag is on, and the string when shown as UTF-8 is correct:

use Devel::Peek; my $str = "\x{20AC}"; Dump($str); __END__ SV = PV(0x15c0d70) at 0x15e0440 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK,UTF8) PV = 0x15e4e10 "\342\202\254"\0 [UTF8 "\x{20ac}"] CUR = 3 LEN = 10 COW_REFCNT = 1

Here, [UTF8 "\x{20ac}"] is correct. There's also utf8::is_utf8($str) to check for the UTF8 flag, although I'd recommend only using that for debugging as well. If you don't want all the extra output, you might just say:

use Data::Dump; my $str = "\x{20AC}"; dd $str, utf8::is_utf8($str); __END__ ("\x{20AC}", 1)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1227736]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2020-08-07 14:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which rocket would you take to Mars?










    Results (45 votes). Check out past polls.

    Notices?