http://qs321.pair.com?node_id=11118300


in reply to Re: Function to sweep a file tree
in thread Function to sweep a file tree

Thanks vr ! It has really helped me a lot!

… I've re-written your code completely ...

OK, Good!

1) I rewrote my FS_sweep based on your proposal

my $d = Win32::LongPath-> new; … $d-> readdirL;
Doing stress tests using C:, I got a number of problems. The script sometimes works. But often it stucks, loops and was difficult to kill. Had to use the ctrl-alt-del/activity handler to stop it. I think that the memory is overwrite by  $d-> readdirL;. The largest directory I have found returned by  $d-> readdirL; has 38252 entries. MS File Explorer says: 38250 objekt

2) I rewrote my FS_sweep using while ( my $name = $dir->readdirL() )

Below follows a script which can be used to test this approach.

This works much better. But there are still problems. The script is sometimes stuck (no cpu time is used) or looping (cpu time is used, but nothing is happening). When logging the found file pathes to a file, the frequency of the problem seem to be lower. There is probably some type of timing problem in  readdirL. By accessing C:/Users the problem is rather frequent. Sweeping less complicated file structure as C: seem to be OK!

File path like <C:/Users/bo/Application Data/À> are sometimes returned by readdirL()!?

Here are some results

dir: C: #dirs: 79923 #files: 305816 #nodes: 385739 1/s: 5749 dir: D: #dirs: 7776 #files: 115255 #nodes: 123031 1/s: 14499 dir: Q: #dirs: 907 #files: 16095 #nodes: 17002 1/s: 12374 dir: C: #dirs: 67183 #depth: 13 #files: 259099 #nodes: 326282 1/s: +5558 (skipping 'C:/Users')

The in C:/Windows found number of files are 193 less than shown by the MS File Explorer and for directories 36 less.

In the documentation of Win32::LongPath there is in one of the examples

# recurse if dir if (($file ne '.') && (($stat->{attribs} & (FILE_ATTRIBUTE_DIRECTORY | FILE_ATTRIBUTE_REPARSE_POINT)) == FILE_ATTRIBUTE_DIRECTORY)) { search_tree ($name); next; }
What does $stat->{mode} & S_IFDIR correspond to?

Here is my test script:

use strict; use warnings; use 5.010; use Path::Tiny qw( path ); use Data::Dump qw(dump dd ddx); use Win32::LongPath; use File::stat; use Fcntl ':mode'; use Benchmark qw(:all); binmode STDOUT, ":utf8"; binmode STDERR, ":utf8"; my @dir_skip = ( '$RECYCLE.BIN', 'System Volume Information', 'Config.Msi', 'C:/Use +rs' #, 'C:/AMD', 'C:/hp' ); my $dir_skip = join '|', map { quotemeta } @dir_skip; my $dir_skip_regexp = qr {$dir_skip$}; sub do_dir { my $dir_path = shift; my $sub_ref = shift; # callback my $dir = Win32::LongPath->new; unless ( $dir->opendirL($dir_path) ) { warn "!! unable to open $dir_path ($^E)"; return; } my @dir_name; while ( my $name = $dir->readdirL() ) { if ( $name =~ m{ ^[.]{1,2}$ }x ) { next; } my $path = "$dir_path/$name"; my $stat = lstatL($path); if ( !defined $stat ) { next if $^E =~ /Åtkomst nekad/; warn "!! SKIP $^E <$path>"; next; } if ( $stat->{mode} & S_IFREG ) { # normal file $sub_ref->( $path, $stat ); # call callback } elsif ( $stat->{mode} & S_IFDIR ) { # dir push @dir_name, $name; } else { warn "!! ? $name"; } } return \@dir_name; } { my @to_do; my $max_depth; sub td_to_txt { dump @to_do; } sub td_clear { @to_do = (); $max_depth = 0; } sub td_down { push @to_do, []; my $depth = @to_do; $max_depth = $depth if $depth > $max_depth; } sub max_depth { return $max_depth } sub td_add { my $name = shift; @to_do = [] unless (@to_do); push @{ $to_do[-1] }, $name; } sub td_add_aref { my $dir_aref = shift; push @{ $to_do[-1] }, @$dir_aref; } sub td_path_next { return join '/', map { $_->[0] } @to_do; } sub td_remove_dir { my $aref = $to_do[-1]; my $removed = shift @$aref; # remove dir return if @$aref; # more dirs while ( $aref = $to_do[-1] ) { if ( !@$aref ) { $removed = pop @to_do; # remove level next; } $removed = shift @{ $to_do[-1] }; # remove dir return if @$aref; # more dirs } } } sub FS_sweep { my $dir_path = shift; my $sub_ref = shift; td_clear; td_down; td_add($dir_path); my $dir_cnt = 0; my $t0 = Benchmark->new; while ( my $dir_path = td_path_next ) { if ( $dir_path =~ m{$dir_skip_regexp} ) { warn "SKIPING DIR $dir_path"; td_remove_dir; next; } $dir_cnt++; my $dir_name_aref = do_dir( $dir_path, $sub_ref ); my $sub_dir_nof = @$dir_name_aref; if ( $sub_dir_nof > 1000 ) { warn "!! MANY SUBDIR $sub_dir_nof in $dir_path"; } if (@$dir_name_aref) { # subdir td_down; td_add_aref($dir_name_aref); } else { warn '!! ! defined $dir_name_aref' if !defined $dir_name_a +ref; td_remove_dir; } } my $td = timediff( Benchmark->new, $t0 ); return $dir_path, $dir_cnt, max_depth, $td; } my @output; my $file_cnt = 0; sub file_log { my $file_path = shift; $file_cnt++; warn "!# $file_cnt $file_path\n" if not $file_cnt % 10000; } sub summary { my $dir_path = shift; my $dir_cnt = shift; my $max_depth = shift; my $td = shift; my $node_cnt = $dir_cnt + $file_cnt; my $node_per_second = $td->cpu_p > 0 ? $node_cnt / $td->cpu_p : -1 +; my $txt = sprintf "\n\n!! FS_sweep summary dir: %s\n #dirs: %d #depth: %d #files: %d # +nodes: %d 1/s: %d\n", $dir_path, $dir_cnt, $max_depth, $file_cnt, $node_cnt, $node_per +_second; $file_cnt = 0; return $txt; } my $ls_log = 1; # activate listing of files in 'ls_log.txt' my $log_fh; $log_fh = path('ls_log.txt')->openw_utf8 if $ls_log; sub FS_file_big { my $file_path = shift; my $stat_hash_ref = shift; file_log($file_path); say {$log_fh} $file_path if $ls_log; my $size = $stat_hash_ref->{size}; push @output, "BIG $file_path size: $size\n" if $size > 100000000; } sub output { if (@output) { say "Output:"; say map { "$_\n" } grep { defined } @output[ 0 .. 100, 1000 .. 1010, 2000 .. 20 +10 ]; say "END Output\n"; @output = (); } STDOUT->flush; } say summary( FS_sweep( 'C:/Windows', \&FS_file_big ) ); output; say summary( FS_sweep( 'C:', \&FS_file_big ) ); output; foreach my $dev (qw{ }) { # add C D ... warn "!! START $dev: =======================================\n"; say summary( FS_sweep( "$dev:", \&FS_file_big ) ); output; }