Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Finding old copies of files

by parv (Vicar)
on Feb 27, 2021 at 05:14 UTC ( #11128871=note: print w/replies, xml ) Need Help??


in reply to Finding old copies of files

Looks like you are counting a file to be a duplicate if there is already a file with that name. Correct me please if I have that wrong.

I personally would have compared the MD5 (or SHA-256) checksums to remove exact duplicates first. Then, I would have created a separate version control repository to collect -- thus to be able to compare later -- the files with the same names.

Replies are listed 'Best First'.
Re^2: Finding old copies of files
by Leitz (Scribe) on Feb 27, 2021 at 14:44 UTC

    Same name and size, in:

    $known_files{$file}{$size} = 1;

    An MD5 or SHA-256 sum would catch different files of the same size. However, they are computationally intense and in this use case overkill. I'm making one authoritative version and then revising after things are cleaned up. Thus any changed files are likely to get changed a few more times.

    Chronicler: The Domici War (domiciwar.net)

    General Ne'er-do-well (github.com/LeamHall)

      An MD5 or SHA-256 sum would catch different files of the same size. However, they are computationally intense and in this use case overkill

      Calculating the MD5 of a file should not be significantly slower than copying the file on a modern computer.

      You could even optimize by delaying the MD5 calculation until you find a second file with same size and base name.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      $known_files{$file}{$size} = 1;

      Change that to ...

      $known_files{$file}{$size}++;

      ... and you know how many (possible) duplicates you have found.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      Ah, right; I had missed that. Appreciate the correction.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11128871]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2022-12-06 01:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?