Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re: getting rid of UTF-8

by kcott (Archbishop)
on Nov 25, 2022 at 07:14 UTC ( #11148373=note: print w/replies, xml ) Need Help??

in reply to getting rid of UTF-8

G'day BernieC,

Regex issues:

  • Your first regex, s/^\xef\xbb\xbf//g, anchors to the start of the string: later ef bb bf sequences will not be removed.
  • Your second regex, s/\xef\xbb\xbf//, has no /g modifier: only the first ef bb bf sequence will be removed.

What you need is s/\xef\xbb\xbf//g:

$ perl -Mstrict -Mwarnings -E ' my $x = "\x{ef}\x{bb}\x{bf}123,,,\x{ef}\x{bb}\x{bf}456"; say "Full string:"; system "echo $x | hexdump -Cv"; $x =~ s/\xef\xbb\xbf//g; say "All BOM sequences removed:"; system "echo $x | hexdump -Cv"; ' Full string: 00000000 ef bb bf 31 32 33 2c 2c 2c ef bb bf 34 35 36 0a |...123,,, +...456.| 00000010 All BOM sequences removed: 00000000 31 32 33 2c 2c 2c 34 35 36 0a |123,,,456 +.| 0000000a

To remove other characters:

  • Non ISO-8859-1: y/\x00-\xff//cd
  • Non 7-bit ASCII: y/\x00-\x7f//cd

— Ken

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11148373]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2023-02-03 12:44 GMT
Find Nodes?
    Voting Booth?
    I prefer not to run the latest version of Perl because:

    Results (25 votes). Check out past polls.