Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

SHA1 sum archival methods

by r1n0 (Beadle)
on Nov 05, 2009 at 13:32 UTC ( [id://805261]=perlquestion: print w/replies, xml ) Need Help??

r1n0 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,
I want to extend a big thank you for all the helpful information over the course of time. :-)
This discussion involves thoughts about creating a SHA1sum archive of file content. I want a portable archive that can be shared with others (emailing them the archive or allowing them to download it). I don't want to use a database because I want this as portable as possible. I would prefer to use files and directory trees if necessary, so when it is shared, others won't need to setup a database to read it. I want to use just Perl modules, too.

Thanks to other monks, I have recently learned how to tie a hash to a file with DB_File. Is there a way to use files to store hash of hashes? If there is, this is a method I was considering. Also, a big log file could be created (text) but after millions of files summed, looking up things might take some time.

The end result of this will allow someone to create SHA1sums of all files on their system. When completed, a master archive (what I want to build) will be used to compare the SHA1sums against a SHA1sum archive of whitelist files and/or copyright material and/or blacklist. I want to create a process that is fast, efficient, and doesn't require a DB solution. Fast is relative to the solution, I know. I'm not asking for any code, I am more interested in thought processes for how others think I should go about doing this. Once completed, I will post the code to the Monastery.

Thank you in advance for your thoughts.

Replies are listed 'Best First'.
Re: SHA1 sum archival methods
by wfsp (Abbot) on Nov 05, 2009 at 14:52 UTC
    Is there a way to use files to store hash of hashes? ... looking up things...
    DBM::Deep may be worth a look. It is a disk based hash and, imo, ideal for heavy duty lookups. The DBM::Deep::Cookbook "contains useful tips and tricks, plus some examples of how to do common tasks."

    update: Oh, and it is very fast indeed. :-)

Re: SHA1 sum archival methods
by redgreen (Priest) on Nov 05, 2009 at 15:32 UTC

    It sounds like overall you want to write your own database. If that is your goal, then do it.

    Myself, I would use sqlite DBD::SQLite. It gives you a database with indexes, but without excessive requirements of setting up a database server. It is well tested, and fast.

    It sounds like you are still in the design phase. Why not design the data storage section so it can be changed around? Start now with DBI, using SQLite, and then change it later on if needed. You would then have something to compare against, and have many options.

    You say you want to share your final program. Watch out! End users will want to do more with your program than you can dream of. Just because your file archive is only 100,000 files and works great with DB_File, doesn't mean the next user with 100,000,000 files will be happy with your design. They might not mind setting up a full database for storage. Keep your options open.

      I completely agree on leaving the design flexible for changes to another capability, later. And, I agree about sharing code. I want to thank fellow monks for the ideas. I was wondering if MLDM compares in speed to these others? Has anyone run a comparison of the speeds from these various modules in the past?

      Modules:
      DB_File
      DBD::SQLite
      MLDBM
      DBM::Deep

      I am interested in speed and ability to house a large number of records. I understand this is probably not possible, and for a large volume of records (>100,000,000) I should probably move to a full DB.

      Thank you in advance for your help.
        MLDB uses DB_File so a benchmark doesn't make any sense.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://805261]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (7)
As of 2024-04-23 11:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found