Interesting. I wrote my own that does pretty much the same
thing, but in a different way (I only use one hash, so I
suspect it will use less memory (but see response below for
the final word)).
#! /usr/bin/perl -w
use strict;
use File::Find;
use Digest::MD5;
my %digest;
my $total_bytes = 0;
my $dups = 0;
sub wanted {
return unless -f $_;
my $bytes = -s _;
return unless $bytes;
if( !open IN, $_ ) {
print "Cannot open $_ for input: $!\n";
return;
}
my $md5 = Digest::MD5->new;
my $d = $md5->addfile( *IN )->digest;
close IN;
if( defined $digest{$d} ) {
print "$bytes\t$digest{$d}\t$File::Find::name\n";
$total_bytes += $bytes;
++$dups;
}
else {
$digest{$d} = $File::Find::name;
}
}
foreach my $d ( @ARGV ) {
print "=== directory $d\n";
find \&wanted, $d;
}
printf "Statistics:
Duplicates: %12d
Bytes: %12d
KBytes: %12d
MBytes: %12d
GBytes: %12d\n",
$dups,
$total_bytes,
$total_bytes / (1024**1),
$total_bytes / (1024**2),
$total_bytes / (1024**3);
It is very verbose, but that's because I pipe the output
into something that can be handed off to users in a
spreadsheet so that they can do their own housekeeping
(2Gb of duplicates in 45Go of files...).
BTW, you can also save a squidgin of memory by using
the digest() method, rather than the hexdigest() method,
since the value is not intended for human consumption.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|