Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^2: scalable duplicate file remover

by spx2 (Deacon)
on Mar 03, 2008 at 08:53 UTC ( [id://671607]=note: print w/replies, xml ) Need Help??


in reply to Re: scalable duplicate file remover
in thread scalable duplicate file remover

First of all thank you very much for the critique,it is very well welcomed from my part.
I will use it to improve the program.
1)why do you think the current method of opening the files does not yield correct results ?
(I compared my results of SHA1s against sha1sum unix utilitary and they came out ok,that's
why I'm asking).
2)you are right,I will do this
3)ok I understand,where could I read more about this ?
4)As I read the documentation and thinking that a number in base 10 should always present more
digits than its representation in base 16 I dont understand how it could be shorter in base 10.
I don't get why they say I will get a shorter string in a lower base.


Also they talk about using a single sha1 object and reusing it because of the reset() method that
can clear out the old data from it.
Do you think this will speed up things ?

Replies are listed 'Best First'.
Re^3: scalable duplicate file remover
by jwkrahn (Abbot) on Mar 03, 2008 at 18:18 UTC
    1. From the documentation for Digest::SHA1:

          $sha1->addfile($io_handle)
      [ SNIP ]
              In most cases you want to make sure that the $io_handle is in "binmode" before you pass it as argument to the addfile() method.

    2. OK.    :-)

    3. Typeglobs and Filehandles
      How do I pass filehandles between subroutines?
      How can I use a filehandle indirectly?

    4. $sha1->digest returns a digest in binary form while $sha1->hexdigest is in hexadecimal form. For example:

      $ perl -le'
      my $digest     = "\x02\x07\xFA\x78";
      my $hex_digest = "0207FA78";
      print for length( $digest ), length( $hex_digest );
      '
      4
      8

    5. Update: reset() may or may not speed things up. You would have to compare both methods with Benchmark to be sure.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://671607]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2024-03-29 08:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found