http://qs321.pair.com?node_id=955183


in reply to Re^2: Windows NTFS UTF-16LE File-Operations
in thread Windows NTFS UTF-16LE File-Operations

Hmm, it's just that Win32::GetLongPathName() returns the perl string I'd most expect in Win32-land. By "expect", I mean "jives with what I see in Windows Explorer".

Using Explorer, I created a file "snowman ☃" in a new folder "my_dir". That file was created by renaming an empty text file with "snowman " first, and then copy+pasting the snowman character. Then I ran the following:
#!perl use Win32; use File::Find; use Devel::Peek; my @paths; find sub { push @paths, ( $_, Win32::GetLongPathName($_), ) if /snow/i; }, "my_dir"; Dump $paths[0]; Dump $paths[1]; __END__ SV = PV(0x469c14) at 0x4f82b4 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x2621844 "SNOWMA~1.TXT"\0 CUR = 12 LEN = 16 SV = PV(0x469c2c) at 0x4f82f4 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x262181c "snowman \342\230\203.txt"\0 [UTF8 "snowman \x{2603}. +txt"] CUR = 15 LEN = 16

The results tell me that the return string from Win32::GetLongPathName() is then fit for Unicode semantics in Perl. Nevermind the underlying filesystem encoding of NTFS (UTF-16LE ? I don't know), I can now treat the path as characters from then on.

Sure, long path names are opposite of short path names. What I'm saying is that Win32::GetLongPathName() is handy to get at the characters instead of octets given by File::Find.

Replies are listed 'Best First'.
Re^4: Windows NTFS UTF-16LE File-Operations
by BrowserUk (Patriarch) on Feb 20, 2012 at 23:51 UTC

    Hm. Doesn't seem to work for me:

    C:\test\junk>dir
    12/11/2010 05:58 7 acentó.txt
    20/11/2010 09:46 <DIR> ελληνικά
    1 File(s) 7 bytes
    3 Dir(s) 236,893,585,408 bytes free

    C:\test\junk>perl -E"say for glob '*'"
    acent¾.txt
    DC44~1

    C:\test\junk>perl -E"say Win32::GetLongPathName( $_ ) for glob '*'"
    acent¾.txt
    Wide character in print at -e line 1.
    ╬Á╬╗╬╗╬À╬¢╬╣╬║╬¼

    (Code tags deliberately omitted to ensure that you can see exactly what I see on my console.)

    Conversely, Win32::FindFile does work for me:


    C:\test\junk>perl -E"say for glob '*'"
    acent�.txt
    DC44~1

    C:\test\junk>perl -E"say Win32::GetLongPathName( $_ ) for glob '*'"
    acent�.txt
    Wide character in print at -e line 1.
    ελληνικά
    ικά


    C:\test\junk>perl -C0 -MWin32::FindFile -E"say for FindFile( '*' )"
    .
    ..
    acentó.txt

    ελληνικά
    ικά

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      You need to encode back to octets the return of Win32::GetLongPathName(), because say is printing out octets. Win32::FindFile is most likely doing that for you already.

      Which encoding you choose should depend on what your console expects (to be able to redisplay the characters).

      Try:
      perl -MEncode -E "say encode( 'UTF-8', Win32::GetLongPathName( $_ ) ) +for glob '*'"

      If 'UTF-8' doesn't work out, choose another encoding that works well with your console. Bear in mind that the UTF-8 encoding may be fine as-is, and that the console just needs better fonts (which is not your case, because I infer that you can see the characters displayed properly using Win32::FindFile).

        Yes. That seems to work. I thought that -C0 would take care of that for me...I'm just very glad that I don't have to deal with this crap!


        \test\junk>perl -C0 -E"say Win32::GetLongPathName( $_ ) for glob '*'"
        acent�.txt
        Wide character in print at -e line 1.
        ελληνικά
        ικά


        C:\test\junk>perl -MEncode -E "say encode('UTF-8', Win32::GetLongPathName( $_ ) ) for glob '*'"
        acentó.txt

        ελληνικά
        ικά


        C:\test\junk>dir
        12/11/2010 05:58 7 acentó.txt
        20/11/2010 09:46 <DIR> ελληνικά

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

      I'd love to try Win32::FindFile. Any word on how many years before it is fixed to build correctly for Perl 5.20?

        Any word on how many years before it is fixed to build correctly for Perl 5.20?

        I wasn't aware that it didn't? But then again, I'm not the author and I haven't had cause to look at it in three years so I wouldn't be.

        I suggest you try the more usual support channels; rather than just some guy who happened to mention it once.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice.