Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

how to read unicode filename

by uva (Sexton)
on Mar 13, 2006 at 11:55 UTC ( [id://536223]=perlquestion: print w/replies, xml ) Need Help??

uva has asked for the wisdom of the Perl Monks concerning the following question:

dear monks,
I'm trying to figure out if I can handle Unicode filenames on Windows using Perl 5.8.7 for MSWin32-x86-multi-thread , and if so, how.

I'm running on Windows 2000 (English language setup), and I have a directory full of files with all sorts of characters in their names. Windows Explorer displays them all very nicely.
in this link i have given the screen shots shown in windows explorer

http://www.mhonarc.org/archive/html/perl-unicode/2004-06/pngH6uiVrjVZ0.png

But when I use readdir() to list them, I find that each of the chinese characters get replaced with a "?", so then, of course, I can't do anything with the filenames returned (like open them).
the results for my program is shown in this link

http://www.mhonarc.org/archive/html/perl-unicode/2004-06/pngxn9h5I2BLD.png

So my question is: How can I deal with these files?
I've tried using Perl scalars containing UTF-8, UTF-16LE and UTF-16BE encodings of the filenames, but none of them work either. Indeed, if I try to write a new file with a name constructed in those ways, then the name of the file actually created is simply the sequence of bytes that make up those encodings.

Replies are listed 'Best First'.
Re: how to read unicode filename
by converter (Priest) on Mar 13, 2006 at 12:30 UTC

    The console in which you're running your code is probably using a font that includes only Latin1 or ISO-8859-1 glyphs.

    When you wrote "of course, I can't do anything with the filenames returned (like open them)" were you basing this on the results obtained from testing actual code, or on the assumption that because you were seeing the wrong characters in the console, they must be wrong in the data?

      dear monks,
      consider a path named d:\\directory1\\

      contains the list of directories , some directory contains english letters and some contains chinese letters.
      if i use the following program to list the sub directories, it is not giving the directory with chinese letters.
      open output,">:utf8","D:\\directory1\\output.doc" or die "Couldn't ope +n STDOUT: $!"; opendir DIR,"D:/directory1" or print " \ncould not open the directory +: $!"; print OUTPUT "\nreading the list from the directory\n"; while ($list=readdir DIR) { if(-d $list) { print OUTPUT "$list\n"; } }
      it not even recognise the chinese directory . And the output is
      . .. directory1 directory2
      both thes directories contains only english letters. But the directories containing chinese letters is not displayed in that output file.
Re: how to read unicode filename
by Anonymous Monk on Mar 13, 2006 at 14:46 UTC
      dear monks,
      i saw the above link which is useful . but i have trouble in the unpack operation .
      Is it possible to pack the chinese characters like,<\br>
      pack("A4",$chinese);
      the output : ascii junk.
      i tried the above but that is not giving me the appropriate answer. Any one tell why that happening.

        In Windows 7, this outputs a correct utf8 listing.

        open fList, '-|:encoding(UTF-16LE)', 'cmd /U /C dir /W'; open fOut, '>', 'out.txt'; foreach (<fList>) { utf8::encode($_); print fOut $_; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://536223]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (2)
As of 2024-04-26 06:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found