http://qs321.pair.com?node_id=577311

b4swine has asked for the wisdom of the Perl Monks concerning the following question:

Windows once again does wonderful things without being asked: I am writing unicode to a file, to be read back in again. utf-16le has 2 byte or 4 byte characters, but on windows, if one of these bytes is a CR, it is automatically "corrected" to a CR LF, and I end up with an odd number of bytes, and a mess when I try to read this in with the same encoding. What a mess. And since I am already using binmode, I don't know how to set it to 'binary'. It does not work to just add a ':raw' or ':bytes' to the encodoing.
$_ = "\x{ff0a}\x{1234}"; open OUT, ">temp.txt"; binmode OUT, ":encoding(UTF-16le)"; print OUT; close OUT; # The output file contains FIVE bytes: 0D 0A FF 34 12 open IN, "<temp.txt"; binmode IN, ":encoding(UTF-16le)"; <IN>; # this line causes: UTF-16LE:Partial character at temp.pl lin +e 12. close IN;

Replies are listed 'Best First'.
Re: crlf mess in unicode utf-16le
by ikegami (Patriarch) on Oct 10, 2006 at 03:18 UTC

    Apparently, the encoding layer does not remove underlying layers. A solution (workaround?) is to push raw first.

    open OUT, ">temp.txt"; binmode OUT, ":raw:encoding(UTF-16le)";
      Thank you guys, I've got same issue with tell() and binmode with :raw remove warnings.