Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: print all files is soo slow! Why? (stat, ntfs, links)

by tye (Sage)
on Jul 27, 2016 at 12:10 UTC ( #1168638=note: print w/replies, xml ) Need Help??


in reply to print all files is soo slow! Why?

I suspect that the slowness is largely due to the fact that the Perl code can't resist doing a stat on each file found and Perl's emulation of stat(2) on Windows does extra work to ask for the count of "links" that exist to that file. Unfortunately, ntfs supports hard links in some way such that the number of hard links is not efficiently cached as in a Unix inode and so the code to look up the link count sometimes does things that can take significantly longer than would be taken by only use of FindNextFile. See p5git://win32/win32.c.:

if (!w32_sloppystat) { /* We must open & close the file once; otherwise file attribut +e changes */ /* might not yet have propagated to "other" hard links of the +same file. */ /* This also gives us an opportunity to determine the number o +f links. */ HANDLE handle = CreateFileA(path, 0, 0, NULL, OPEN_EXISTING, 0 +, NULL); if (handle != INVALID_HANDLE_VALUE) { BY_HANDLE_FILE_INFORMATION bhi; if (GetFileInformationByHandle(handle, &bhi)) nlink = bhi.nNumberOfLinks; CloseHandle(handle); }

It is my experience that the time taken by that code can be fairly short but sometimes is pronounced (and seems to at least nearly lock up much of Windows and so feels like some kind of interlock that also involves networking calls). Though I have yet to find technical details about what is going on.

It is too bad that one can't easily arrange for w32_sloppystat to be true for the many cases when one would like stat to be fast at the expense of things that very often won't matter much to Win32 uses of Perl code.

#ifdef PERL_IS_MINIPERL w32_sloppystat = TRUE; #else w32_sloppystat = FALSE; #endif

It would quite nice if that unconditional FALSE were instead a lookup of some environment variable, like PERL_WIN32_SLOPPY_STAT. (Update: Or does ${^WIN32_SLOPPY_STAT­} = 1; still work for that?)

Though, it is possible to get Perl to quickly iterate over file names in Win32 by avoiding readdir and instead calling FindFirstFile and FindNextFile more directly. There is even such code hidden deep in the archives of this very website. I'll probably eventually succeed in finding it at which point I'll post a pointer to such.

Update: Re: Threads slurping a directory and processing before conclusion looks useful (or at least interesting). It hints that one can get sloppy stat via some special Perl variable. I have not yet looked into whether that is still true. Re: Quickest way to get a list of all folders in a directory says similar things and fills in one more detail. Re^3: Win32api::File and Directories offers some code that might be another good route.

- tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1168638]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (3)
As of 2020-10-28 03:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (259 votes). Check out past polls.

    Notices?