in reply to getting rid of UTF-8
G'day BernieC,
Regex issues:
- Your first regex, s/^\xef\xbb\xbf//g, anchors to the start of the string: later ef bb bf sequences will not be removed.
- Your second regex, s/\xef\xbb\xbf//, has no /g modifier: only the first ef bb bf sequence will be removed.
What you need is s/\xef\xbb\xbf//g:
$ perl -Mstrict -Mwarnings -E ' my $x = "\x{ef}\x{bb}\x{bf}123,,,\x{ef}\x{bb}\x{bf}456"; say "Full string:"; system "echo $x | hexdump -Cv"; $x =~ s/\xef\xbb\xbf//g; say "All BOM sequences removed:"; system "echo $x | hexdump -Cv"; ' Full string: 00000000 ef bb bf 31 32 33 2c 2c 2c ef bb bf 34 35 36 0a |...123,,, +...456.| 00000010 All BOM sequences removed: 00000000 31 32 33 2c 2c 2c 34 35 36 0a |123,,,456 +.| 0000000a
To remove other characters:
- Non ISO-8859-1: y/\x00-\xff//cd
- Non 7-bit ASCII: y/\x00-\x7f//cd
— Ken
In Section
Seekers of Perl Wisdom