http://qs321.pair.com?node_id=323727

ibanix has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I've got a simple question. I have a file in Unicode that I want to read for input. Before knowing it was Unicode, I tried to read it in, but when I printed it back, I get spaces between each letter:

L i k e  t h i s  s e n t a n c e .

I looked at perlunicode but my head is swimming with locacles, encodings, and character sets. Can anyone help?

Thanks!

ibanix
$ echo '$0 & $0 &' > foo; chmod a+x foo; foo;

Replies are listed 'Best First'.
Re: How to read a Unicode file?
by Zaxo (Archbishop) on Jan 23, 2004 at 23:14 UTC

    Chances are that if ASCII is turning up with interposed zero bytes, your file is in utf16. That is the default encoding on Windows.

    Perl 5.8 is pretty smart about unicode.

    After Compline,
    Zaxo

      Thanks for the UTF-16 tip. I found that

      open(FILE, "<:encoding(UTF-16LE)", $file)

      did the magic for me.

      $ echo '$0 & $0 &' > foo; chmod a+x foo; foo;
Re: How to read a Unicode file?
by Aragorn (Curate) on Jan 23, 2004 at 23:20 UTC
    perluniintro is a gentler introduction to Perl and Unicode. I'm by no means an expert, but I think that, depending on the encoding, you can use open(my $fh, "<:utf8", "file") or something like open(my $fh, "<:encoding(ucs2), "file").

    Arjen

for more detail ...
by g00n (Hermit) on Jan 25, 2004 at 03:54 UTC