How to read a Unicode file?

ibanix has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I've got a simple question. I have a file in Unicode that I want to read for input. Before knowing it was Unicode, I tried to read it in, but when I printed it back, I get spaces between each letter:

L i k e t h i s s e n t a n c e .

I looked at perlunicode but my head is swimming with locacles, encodings, and character sets. Can anyone help?

Thanks!

ibanix

$ echo '$0 & $0 &' > foo; chmod a+x foo; foo;

Comment on How to read a Unicode file? Select or Download Code

Replies are listed 'Best First'.
Re: How to read a Unicode file? by Zaxo (Archbishop) on Jan 23, 2004 at 23:14 UTC
Chances are that if ASCII is turning up with interposed zero bytes, your file is in utf16. That is the default encoding on Windows. Perl 5.8 is pretty smart about unicode. After Compline, Zaxo	[reply]
Re: Re: How to read a Unicode file? by ibanix (Hermit) on Jan 24, 2004 at 00:05 UTC
Thanks for the UTF-16 tip. I found that `open(FILE, "<:encoding(UTF-16LE)", $file)` did the magic for me. `$ echo '$0 & $0 &' > foo; chmod a+x foo; foo;`	[reply] [d/l] [select]
Re: How to read a Unicode file? by Aragorn (Curate) on Jan 23, 2004 at 23:20 UTC
perluniintro is a gentler introduction to Perl and Unicode. I'm by no means an expert, but I think that, depending on the encoding, you can use `open(my $fh, "<:utf8", "file")` or something like `open(my $fh, "<:encoding(ucs2), "file")`. Arjen	[reply] [d/l] [select]
for more detail ... by g00n (Hermit) on Jan 25, 2004 at 03:54 UTC
there was a node on unicode (Guess between UTF8 and Latin1/ISO-8859-1 ) not long ago that may give more information about reading unicode.	[reply]

Back to Seekers of Perl Wisdom