Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^2: UTF-8 text files with Byte Order Mark

by muba (Priest)
on Feb 13, 2007 at 20:21 UTC ( [id://599767]=note: print w/replies, xml ) Need Help??


in reply to Re: UTF-8 text files with Byte Order Mark
in thread UTF-8 text files with Byte Order Mark

Yeah, this works, except that the BOM indeed is a three-bytes thing as said above. So the code, that seems to work, now looks like this:

while (my $line = <$rulesFH>) { if ($. == 1) { # Remove Byte Order Mark if it's there use Encode; my $octets = encode("utf8", $line); $octets =~ s/^\x{ef}\x{bb}\x{bf}//; $line = decode("utf8", $octets); } # rest... }

Replies are listed 'Best First'.
Re^3: UTF-8 text files with Byte Order Mark
by ikegami (Patriarch) on Feb 13, 2007 at 20:48 UTC
    my $octets = encode("utf8", $line); $octets =~ s/^\x{ef}\x{bb}\x{bf}//; $line = decode("utf8", $octets);

    is the same thing as

    my $BOM = decode("utf8", "\x{ef}\x{bb}\x{bf}"); $line =~ s/^$BOM//;

    is the same thing as

    my $BOM = chr(0xFEFF); $line =~ s/^$BOM//;

    is the same thing as

    $line =~ s/^\x{FEFF}//;

    which is what I gave you. Much simpler!

      Meh. Indeed, I didn't realise that. Thank you!

      Thank you!!! This saved me a lot of trouble. I am also trying to strip out these UTF-8 byte order mark characters, which google docs puts in by default to downloaded text files. By the way I found that \x{FEFF} was not the same as \x{ef}\x{bb}\x{bf}

        By the way I found that \x{FEFF} was not the same as \x{ef}\x{bb}\x{bf}

        Yeah, "\x{ef}\x{bb}\x{bf}" is the UTF-8 encoding of the BOM / U+FEFF / "\x{FEFF}".

        A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://599767]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2024-04-25 14:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found