Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Alternatives to DB for comparable lists

by peterrowse (Acolyte)
on Jun 01, 2018 at 03:28 UTC ( [id://1215593]=note: print w/replies, xml ) Need Help??


in reply to Alternatives to DB for comparable lists

Thanks for all the advise, I have gone the DB route - although the storable option was I guess the 'other way' that I was hazily thinking about, I do believe I've used it long in the past, but once I had a bit of a nudge in the DB direction the simplicity and ease for future tweaks won me over. With this type of thing that you're often modifying while you use it, the DB does make it easy to change on a whim, so although I don't quite know what challenges might arise as I work with the data, what I might realise 'ah forgot I might want to do that', if I can avoid calculating all those MD5s (or SHA-256s - I will probably change to that) again, and just update etc in the easiest way, its worth paying a price of slightly reduced performance if there is one. And I didn't know (/ had forgotten possibly, because I've used DBD a lot in the past, but maybe it was only with external DBs) how simple the set up of DBD::SQLite was even on an over stressed laptop.

I just took a 12 hour plane trip and coded most of the project during that. Kill two birds with one stone, get a job done and find a way to make a plane journey go a bit faster - I find getting into a bit of code makes time fly. Didn't go the SHA-256 route yet because I didn't have the module installed but will install it for the return trip and hope I have enough work left in the job to keep be occupied for the ride back, because collisions are a concern, even if I can handle them with a last resort diff, they will slow things down because of the low bandwidth between servers.

Because access between servers is not consistent, I am going to run the code locally to each server without need for a network, and then once finished or updated transfer the DB files to the processing machine. Some of the machines are quite slow atom types so its best to nice the process and let them do it in their own time, no need for up to the minute results. Then if theres work to do like deleting, making local links (for local duplicates), whatever else I don't know yet, I'll either do that live from the central machine processing script or automatically create local processing scripts. Really though at the moment I am thinking of just consolidating all this data into a single file system that can be kept organised from this point on automatically - ideally through ZFS although I don't know whether it will play nicely with the reliability and speed of the links, I have only used it on single machines so far.

Anyways just wanted to say thanks for the help and ideas.

Best, Pete
  • Comment on Re: Alternatives to DB for comparable lists

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1215593]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (8)
As of 2024-04-23 12:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found