Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Sort directory by file size

by choroba (Cardinal)
on May 18, 2016 at 16:05 UTC ( [id://1163353]=note: print w/replies, xml ) Need Help??


in reply to Sort directory by file size

That's probably the common trap of readdir: it returns the file names, not file paths.

Fix:

my @sDir = sort { -s "$dir/$a" <=> -s "$dir/$b" } readdir $D1;

If the number of files is high, asking for each file's size several times might slow the program significantly. Schwartzian transform should help.

my @sDir = map $_->[0], sort { $a->[1] <=> $b->[1] } map [ $_, -s "$dir/$_" ], readdir $D1;

($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

Replies are listed 'Best First'.
Re^2: Sort directory by file size
by nnigam1 (Novice) on May 18, 2016 at 20:20 UTC

    Thanks o Wise Monks. I will try the suggestions. Here is my whole script so you can see how I am using the command. I should have put it here earlier
    More then just writing a script to find duplicates, I want to refresh my perl scripts on which I have lost touch over the last few years.

    use strict; use IO::File; use Digest::MD5 qw(md5); my ($aLen, $i, $j, $tFile, $sFile); my (@sDir,@tDir); my ($cFile, $ex1, $ex2); my ($cFile2, $chk1, $chk2); my ($fs1, $fs2); my($par1,$par2,$par3,$par4) = ($ARGV[0],$ARGV[1],$ARGV[2],$ARGV[3]) ; + # Expects a slash at the end if directory # par1 is the directory to Check # par2 is directory to check against # Exact dups are placed in ncn_cmp.bat to delete from first folder # Differences in ncn_diff.txt as either missing or different <br> $par1 = $par1 || ".\\"; $ex1 = $par2 || ".err"; $ex2 = $par3 || ".fmx"; $par4 = $par4 || "NCN"; $chk1 = "Apple"; open (OUT, ">ncn_cmp.bat"); open (DF, ">ncn_diff.txt"); open (SM, ">ncn_same.txt"); $tFile="XXX"; if (-d $par1) { opendir D1, $par1; #@tDir=sort (readdir D1); #@sDir=sort {-s $a <=> -s $b } @tDir; @sDir=sort {-s $a <=> -s $b } (readdir D1); $aLen = @sDir; for ($j=0;$j<$aLen;$j++){ next if !(-f $par1 . "\\" .$sDir[$j]); next if $sDir[$j] eq "ncn_cmp.bat"; next if $sDir[$j] eq "ncn_diff.txt"; next if $sDir[$j] eq "ncn_same.txt"; if ($par1 =~ s/\\$//g){ $sFile = $par1 . $sDir[$j]; } else { $sFile = $par1 . "\\" . $sDir[$j]; } #$sFile = $par1 . "\\" . $sDir[$j]; if ($tFile eq "XXX") { $tFile = $sFile; next; } $fs1 = -s $sFile; $fs2 = -s $tFile; if ($fs1 eq $fs2) { open(TST, "<", $tFile); $chk2 = md5(<TST>); close(TST); open(TST, "<", $sFile); $chk1 = md5(<TST>); close(TST); } else { # print $sFile . " size " . $fs1 . "\n"; # print $tFile . " size " . $fs2 . "\n"; $chk2 = "DIF"; } if ($chk1 eq $chk2) { print OUT "del \"" . $sFile . "\"\n"; print SM "echo N | comp " . $tFile . " " . $sFile . "\n"; } else { if ($chk2 eq "NCN") { print DF $tFile . " Not Found\n"; } else {<br> print DF $tFile . " and " . $sFile . " different\n"; } } $chk1 = "ABC"; $chk2 = "DEF"; $tFile = $sFile; } } print OUT "del ncn_diff.txt\n"; print OUT "del ncn_same.txt\n"; print OUT "del ncn_cmp.bat\n"; close(OUT); close(DF); close(SM);
      (1) When you want to post a chunk of code (or data) at the Monastery, start by typing these two lines into the composition box:

      <c>

      </c>

      Then paste your code (or data) into the space between those two tags; you won't need to muck with anything else in order to get the code (or data) to show up correctly when posted. (Don't forget to put your paragraphs of explanation outside the code tags.)

      2. Since you want to use file size to determine when to do md5 checksums, I think it would make more sense to build of a hash of arrays keyed by byte count: for each distinct byte count, the hash key is the size and the hash value is an array holding files of that size. Then loop over the hash and do md5s for each set of two or more files with a given size. You don't really need to do any sorting - just keep track of the different sizes. Here's how I would do it (on a unix/linux system):

      #!/usr/bin/perl use strict; use warnings; use Digest::MD5; die "Usage: $0 dir1 dir2\n" unless ( @ARGV == 2 and -d $ARGV[0] and -d $ARGV[1] ); my %fsize; for my $dir ( @ARGV ) { opendir DIR, $dir or die "$dir: $!\n"; while ( my $fn = readdir DIR ) { next unless -f "$dir/$fn"; push @{$fsize{ -s "$dir/$fn" }}, "$dir/$fn"; } } my %fmd5; my $digest = Digest::MD5->new; for my $bc ( keys %fsize ) { next if scalar @{$fsize{$bc}} == 1; for my $fn ( @{$fsize{$bc}} ) { if ( open( my $fh, "<", $fn )) { $digest->new; $digest->addfile( $fh ); push @{$fmd5{ $digest->b64digest }}, $fn; } } } for my $md ( keys %fmd5 ) { print join( " == ", @{$fmd5{$md}} )."\n" if ( scalar @{$fmd5{$md}} + > 1 ); }
      (That just lists sets of files that have identical content; you can tweak it do to other things, as you see fit.)
        Thank you o Wise Ones.
        Fantastic suggestions. I am incorporating them.
        This is a way I had not thought of. It should definitely improve performance.
Re^2: Sort directory by file size
by nnigam1 (Novice) on May 19, 2016 at 19:25 UTC
    Thank you o Wise Ones.

    This worked perfectly.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1163353]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2024-04-19 16:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found