Thanks vr ! It has really helped me a lot!
… I've re-written your code completely ...
OK, Good!
1) I rewrote my FS_sweep based on your proposal
my $d = Win32::LongPath-> new;
…
$d-> readdirL;
Doing stress tests using C:, I got a number of problems. The script sometimes works. But often it stucks, loops and was difficult to kill. Had to use the
ctrl-alt-del/activity handler to stop it. I think that the memory is overwrite by
$d-> readdirL;. The largest directory I have found returned by
$d-> readdirL; has 38252 entries. MS File Explorer says: 38250 objekt
2) I rewrote my FS_sweep using while ( my $name = $dir->readdirL() )
Below follows a script which can be used to test this approach.
This works much better. But there are still problems. The script is sometimes stuck (no cpu time is used) or looping (cpu time is used, but nothing is happening). When logging the found file pathes to a file, the frequency of the problem seem to be lower. There is probably some type of timing problem in readdirL.
By accessing C:/Users the problem is rather frequent. Sweeping less complicated file structure as C: seem to be OK!
File path like <C:/Users/bo/Application Data/À> are sometimes returned by readdirL()!?
Here are some results
dir: C: #dirs: 79923 #files: 305816 #nodes: 385739 1/s: 5749
dir: D: #dirs: 7776 #files: 115255 #nodes: 123031 1/s: 14499
dir: Q: #dirs: 907 #files: 16095 #nodes: 17002 1/s: 12374
dir: C: #dirs: 67183 #depth: 13 #files: 259099 #nodes: 326282 1/s:
+5558 (skipping 'C:/Users')
The in C:/Windows found number of files are 193 less than shown by the MS File Explorer and for directories 36 less.
In the documentation of Win32::LongPath there is in one of the examples
# recurse if dir
if (($file ne '.') && (($stat->{attribs}
& (FILE_ATTRIBUTE_DIRECTORY | FILE_ATTRIBUTE_REPARSE_POINT))
== FILE_ATTRIBUTE_DIRECTORY)) {
search_tree ($name);
next;
}
What does
$stat->{mode} & S_IFDIR correspond to?
Here is my test script:
use strict;
use warnings;
use 5.010;
use Path::Tiny qw( path );
use Data::Dump qw(dump dd ddx);
use Win32::LongPath;
use File::stat;
use Fcntl ':mode';
use Benchmark qw(:all);
binmode STDOUT, ":utf8";
binmode STDERR, ":utf8";
my @dir_skip = (
'$RECYCLE.BIN', 'System Volume Information', 'Config.Msi', 'C:/Use
+rs'
#, 'C:/AMD', 'C:/hp'
);
my $dir_skip = join '|', map { quotemeta } @dir_skip;
my $dir_skip_regexp = qr {$dir_skip$};
sub do_dir {
my $dir_path = shift;
my $sub_ref = shift; # callback
my $dir = Win32::LongPath->new;
unless ( $dir->opendirL($dir_path) ) {
warn "!! unable to open $dir_path ($^E)";
return;
}
my @dir_name;
while ( my $name = $dir->readdirL() ) {
if ( $name =~ m{ ^[.]{1,2}$ }x ) { next; }
my $path = "$dir_path/$name";
my $stat = lstatL($path);
if ( !defined $stat ) {
next if $^E =~ /Åtkomst nekad/;
warn "!! SKIP $^E <$path>";
next;
}
if ( $stat->{mode} & S_IFREG ) { # normal file
$sub_ref->( $path, $stat ); # call callback
}
elsif ( $stat->{mode} & S_IFDIR ) { # dir
push @dir_name, $name;
}
else {
warn "!! ? $name";
}
}
return \@dir_name;
}
{
my @to_do;
my $max_depth;
sub td_to_txt { dump @to_do; }
sub td_clear { @to_do = (); $max_depth = 0; }
sub td_down {
push @to_do, [];
my $depth = @to_do;
$max_depth = $depth if $depth > $max_depth;
}
sub max_depth { return $max_depth }
sub td_add {
my $name = shift;
@to_do = [] unless (@to_do);
push @{ $to_do[-1] }, $name;
}
sub td_add_aref {
my $dir_aref = shift;
push @{ $to_do[-1] }, @$dir_aref;
}
sub td_path_next {
return join '/', map { $_->[0] } @to_do;
}
sub td_remove_dir {
my $aref = $to_do[-1];
my $removed = shift @$aref; # remove dir
return if @$aref; # more dirs
while ( $aref = $to_do[-1] ) {
if ( !@$aref ) {
$removed = pop @to_do; # remove level
next;
}
$removed = shift @{ $to_do[-1] }; # remove dir
return if @$aref; # more dirs
}
}
}
sub FS_sweep {
my $dir_path = shift;
my $sub_ref = shift;
td_clear;
td_down;
td_add($dir_path);
my $dir_cnt = 0;
my $t0 = Benchmark->new;
while ( my $dir_path = td_path_next ) {
if ( $dir_path =~ m{$dir_skip_regexp} ) {
warn "SKIPING DIR $dir_path";
td_remove_dir;
next;
}
$dir_cnt++;
my $dir_name_aref = do_dir( $dir_path, $sub_ref );
my $sub_dir_nof = @$dir_name_aref;
if ( $sub_dir_nof > 1000 ) {
warn "!! MANY SUBDIR $sub_dir_nof in $dir_path";
}
if (@$dir_name_aref) { # subdir
td_down;
td_add_aref($dir_name_aref);
}
else {
warn '!! ! defined $dir_name_aref' if !defined $dir_name_a
+ref;
td_remove_dir;
}
}
my $td = timediff( Benchmark->new, $t0 );
return $dir_path, $dir_cnt, max_depth, $td;
}
my @output;
my $file_cnt = 0;
sub file_log {
my $file_path = shift;
$file_cnt++;
warn "!# $file_cnt $file_path\n" if not $file_cnt % 10000;
}
sub summary {
my $dir_path = shift;
my $dir_cnt = shift;
my $max_depth = shift;
my $td = shift;
my $node_cnt = $dir_cnt + $file_cnt;
my $node_per_second = $td->cpu_p > 0 ? $node_cnt / $td->cpu_p : -1
+;
my $txt =
sprintf
"\n\n!! FS_sweep summary dir: %s\n #dirs: %d #depth: %d #files: %d #
+nodes: %d 1/s: %d\n",
$dir_path, $dir_cnt, $max_depth, $file_cnt, $node_cnt, $node_per
+_second;
$file_cnt = 0;
return $txt;
}
my $ls_log = 1; # activate listing of files in 'ls_log.txt'
my $log_fh;
$log_fh = path('ls_log.txt')->openw_utf8 if $ls_log;
sub FS_file_big {
my $file_path = shift;
my $stat_hash_ref = shift;
file_log($file_path);
say {$log_fh} $file_path if $ls_log;
my $size = $stat_hash_ref->{size};
push @output, "BIG $file_path size: $size\n" if $size > 100000000;
}
sub output {
if (@output) {
say "Output:";
say map { "$_\n" }
grep { defined } @output[ 0 .. 100, 1000 .. 1010, 2000 .. 20
+10 ];
say "END Output\n";
@output = ();
}
STDOUT->flush;
}
say summary( FS_sweep( 'C:/Windows', \&FS_file_big ) );
output;
say summary( FS_sweep( 'C:', \&FS_file_big ) );
output;
foreach my $dev (qw{ }) { # add C D ...
warn "!! START $dev: =======================================\n";
say summary( FS_sweep( "$dev:", \&FS_file_big ) );
output;
}
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.