Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
(1) When you want to post a chunk of code (or data) at the Monastery, start by typing these two lines into the composition box:

<c>

</c>

Then paste your code (or data) into the space between those two tags; you won't need to muck with anything else in order to get the code (or data) to show up correctly when posted. (Don't forget to put your paragraphs of explanation outside the code tags.)

2. Since you want to use file size to determine when to do md5 checksums, I think it would make more sense to build of a hash of arrays keyed by byte count: for each distinct byte count, the hash key is the size and the hash value is an array holding files of that size. Then loop over the hash and do md5s for each set of two or more files with a given size. You don't really need to do any sorting - just keep track of the different sizes. Here's how I would do it (on a unix/linux system):

#!/usr/bin/perl use strict; use warnings; use Digest::MD5; die "Usage: $0 dir1 dir2\n" unless ( @ARGV == 2 and -d $ARGV[0] and -d $ARGV[1] ); my %fsize; for my $dir ( @ARGV ) { opendir DIR, $dir or die "$dir: $!\n"; while ( my $fn = readdir DIR ) { next unless -f "$dir/$fn"; push @{$fsize{ -s "$dir/$fn" }}, "$dir/$fn"; } } my %fmd5; my $digest = Digest::MD5->new; for my $bc ( keys %fsize ) { next if scalar @{$fsize{$bc}} == 1; for my $fn ( @{$fsize{$bc}} ) { if ( open( my $fh, "<", $fn )) { $digest->new; $digest->addfile( $fh ); push @{$fmd5{ $digest->b64digest }}, $fn; } } } for my $md ( keys %fmd5 ) { print join( " == ", @{$fmd5{$md}} )."\n" if ( scalar @{$fmd5{$md}} + > 1 ); }
(That just lists sets of files that have identical content; you can tweak it do to other things, as you see fit.)

In reply to Re^3: Sort directory by file size by graff
in thread Sort directory by file size by nnigam1

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others chanting in the Monastery: (10)
    As of 2021-04-23 05:41 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found

      Notices?