Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^3: (OT) Redundant Backup

by fundflow (Chaplain)
on Mar 27, 2006 at 20:18 UTC ( [id://539518]=note: print w/replies, xml ) Need Help??


in reply to Re^2: (OT) Redundant Backup
in thread (OT) Redundant Backup

Thanks again for the long reply, but I feel that my question wasn't answered. What is it that you are refreshing? Could it be that you are making more copies of corrupt data? This can happen if your backup DVDs or hard-disks had some problem. For the hard-disk, this could also be something caused by a virus or mistaken overwrite.

I want to make absolutely sure that the data is fine. Also, for the case that it is not I want to be able to repair it. Making 2 or more copies does not solve that problem but luckily there are ECC.

I will post a script when its ready, as it will surely be useful for many people.

Replies are listed 'Best First'.
Re^4: (OT) Redundant Backup
by jhourcle (Prior) on Mar 27, 2006 at 21:38 UTC

    In the situation I described, you refresh the archival copy (ie, you read in, and write it back out), but that wasn't directly talking about your problem, but about the response to use DLT because it supposedly has a longer lifetime.

    ECC does not help you in the case of complete media failure, physical loss, etc. Offsite backups do ... yes, there is a risk of the backups becoming lost or corrupted.

    Normally when archiving, you maintain some sort of a checksum in an attempt to verify that the archive hasn't become corrupted. (yes, checksums can't be used to verify data integrity from malicious behaviour, but the odds of there being a checksum collision purely by chance as the media degrades is very slight).

    For what you're asking, with ECC, I wouldn't want to maintain that information on the same disk -- For the type of scenario that I'm describing, it's much more efficient overall to just make multiple copies, and store them seperately. If the main archive is found to be corrupt, then you go to the backup, then the second backup, etc. After a while, there's a limited return -- but you have to weigh what the cost of backups are against the risk and cost of a given loss.

    From what you've described, the data is kept online -- so you could use something like tripwire to tell you when something's gone wrong, and in that case, restore from the backup. I was trying to provide what I believe to be a better alternative to what you were attempting to do -- of course, I don't know how often your data changes (if you're only keeping 40GB, but it changes daily, my recommendations aren't useful), but if it's just a matter of keeping some pictures in an online repository (ie, they get added to, but not modified).

    Anyway, as I've gotten off on a tangent again -- ECC is for bit level corruption, not catastrophic failure, or even accidential file deletion. Full backups will protect against more types of potential loss. If you're really paranoid about your data, you could combine the two, but you'd have to see if the overall cost is justified for your particular situation.

      Thanks again. I really appreciate your feedback.

      My plan is indeed to have several backups, but also to enhance them with ECC.
      This way, each backup has a better standalone value in the sense that it may be possible to recover small errors. From my experience, disks tend to break random sectors, corrupting one image (1-4 Mb) somewhere in the middle. If a whole image fails, then there will hopefully be another backup which has it. This can be automated via some merging script much like using checksum. Same goes for a complete disk failure.

      The cost of this approach is 50% increase in the data size which is reasonable here. To my understanding, while disks (both magnetic and optical) do use ECC already, they will not use a 50% redundancy. For me (and surely many other photographers) the value of the data is worth this price, and even more. Especially with the low cost of storage nowadays.

      Thanks as well for the link.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://539518]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-03-28 17:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found