http://qs321.pair.com?node_id=524551


in reply to binmode file read error message

The first thing in a UTF-16 data stream (of unspecified endianness) must always be the byte order mark.

Looking at the source of the Encode module the error "Unrecognised BOM" is produced when a Encode::Unicode object has no endian attribute (yet) and encounters anything other than a valid BOM.

So the above error would happen if a file that you were trying to open as UTF16 was in fact ASCII (or UTF8) and started with the two characters "%P" or indeed was, say, a PDF file.

If the Encode::Unicode object has a renewed attribute it will automatically update its own endian attribute upon seeing the initial BOM and can then subsequently process data without a BOM.

The Encode::Unicode objects used by PerlIO are supposed to have this "renewed" attribute so you should never see this message except at the start of the file.

Note that when Perl says "<FIL> line 127" in an error message this just means that the last file read operation was to read line 127 of FIL. It does not necessarily mean that is was the data that was read from that line that was actually being processed.

In particular if the error is occuring in the process of reading the first line from a file it's possible that the error message would reflect the previous read operation. although I'm unable to reproduce this. I have found this happens if you reopen a filehandle.

use strict; while (<DATA>) {}; # This warning appends "<DATA> line 3" warn "Something not related to the DATA"; open DATA, '<:encoding(UTF16)', 'simple.pdf' or die $!; # This gives error UTF-16:Unrecognised BOM 2550 ...,<DATA> line 3. <DATA>; __DATA__ xx xx xx

Please try to produce a minimal but complete script and data file combination that can reproduce the error.

The moral of the story: don't use Perl4-style bare filehandles.