Saving big blessed hashes to disk

b888 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Saving big blessed hashes to disk by rinceWind (Monsignor) on Jul 20, 2005 at 11:00 UTC
I think that you've discovered that Storable provides a persistence mechanism that enables you to dump and load data structures. You are also looking for something scaleable - you were worried about the time to store the data; having the whole data structure in memory is going to eat into your process and operating system resources if it gets large. I can suggest two related approaches: objects and tieing. An object can be a placeholder for some point in your arbitrarily large data. If your application keeps to the OO encapsulation rules, you are only holding the data in memory that you actually need for manipulation; the other parts can still be accessed via method calls. One approach is to hold the data in a database, and use Class::DBI as an object persistence layer to turn rows into objects. Tieing provides an alternative interface your data, and can make it look like an array or a hash (or a filehandle, etc.) When the application performs operations on the tied variable, method calls happen to do the sleight of hand and get your data. The application does not need to know that this is happening - it just sees an array (or hash). See perldoc perltie for more on this. Hope this helps -- Oh Lord, won’t you burn me a Knoppix CD ? My friends all rate Windows, I must disagree. Your powers of persuasion will set them all free, So oh Lord, won’t you burn me a Knoppix CD ? (Missquoting Janis Joplin)	[reply]
Re: Saving big blessed hashes to disk by rev_1318 (Chaplain) on Jul 20, 2005 at 10:14 UTC
If your data is that big and that important, I'd use a database to store and retrieve it... Paul	[reply]
Re: Saving big blessed hashes to disk by adrianh (Chancellor) on Jul 20, 2005 at 10:36 UTC
Now I'm using store method to save my data. But since some time this operation takes 2-3 seconds, and I'm afraid will grow even more. Data can't be split to parts to save separately. Two ideas spring to mind: Change your design so that you can split stuff up so you can save/load stuff incrementally (e.g. have the definitive version of your structure on disk and only load what you need to change). Instead of storing the whole data structure, store the changes you make to it. Then you can recover it by replaying the changes. Without knowing more details of your particular application it's hard to come up with specific advice :-)	[reply]
Re: Saving big blessed hashes to disk by Tanalis (Curate) on Jul 20, 2005 at 09:29 UTC
I'd probably think about using threads to allow the recovery data to be saved in the background. See thread for more information. -- Foxcub `#include www.liquidfusion.org.uk`	[reply]
Re^2: Saving big blessed hashes to disk by dave_the_m (Monsignor) on Jul 20, 2005 at 11:08 UTC
I'd probably think about using threads to allow the recovery data to be saved in the background Er, don't do that. Shared data in perl ithreads is slow and expensive and difficult. Dave.	[reply]
Re: Saving big blessed hashes to disk by Your Mother (Archbishop) on Jul 20, 2005 at 17:40 UTC
I like DB_File and BerkeleyDB for this kind of thing; assuming the approach could possibly match your needs and you'll be able to work out a locking strategy if you need it. It can slow down an application a bit because the in-memory is now on-disk but they're both quite fast and depending on what you're doing you might not even notice any slow down. Make sure to provide for excetion/death handling to clean-up tied dbs, they can be corrupted mysteriously on some platforms when they're not closed properly.	[reply]
Re^2: Saving big blessed hashes to disk by b888 (Beadle) on Jul 21, 2005 at 06:59 UTC
The main reason why I do not want to use something not in memory - is number of requests per second. Well, simply imagine something like chat structure: a lot of users, rooms, private rooms... On every user request application has to get info about session user user state user room ... That's something like I'm doing The only reasonable approach in this case - keep all data in memory (or shared memory at least), as there are 20-40 requests per second from users. And if each of these requests will make 10-30 requests to some db - even if this will work, i suppose it will not be very stable or scalable. p.s. Thanks for all answers	[reply]
Re^3: Saving big blessed hashes to disk by adrianh (Chancellor) on Jul 21, 2005 at 11:08 UTC
The only reasonable approach in this case - keep all data in memory (or shared memory at least), as there are 20-40 requests per second from users. And if each of these requests will make 10-30 requests to some db - even if this will work, i suppose it will not be very stable or scalable. I think you'll be surprised. Databases are very good at this sort of think. Benchmark, don't guess ;-)	[reply]
Re^3: Saving big blessed hashes to disk by Your Mother (Archbishop) on Jul 21, 2005 at 17:46 UTC
I wrote a simple chat this way, much like the Chatterbox, no rooms, but some autoformatting, sessioning, cookies and such. The trick to making it really zoom for me was making it into a daemonized server so that it was always running and always had its tied handle on the DB_File beneath. It was quite a bit faster than the relational DB that drove the page around/above it (even using some of the same DB code to handle users/sessions). The chat was iframed so it could reload independently of the regular site. The trick then is the daemonizing code and handling the HTTP and such. Even for a queue of lines of chat it turns into a lot of code (there is also POE and more already done out there for mini-applications). I've recommended this book before: "Network Programming with Perl" ISBN 0201615711. I'd never done that type of code before but managed to write a chat-daemon server that never needed restarting in 2 years and ran far faster than I expected.	[reply]
Re: Saving big blessed hashes to disk by lwknet (Initiate) on Jul 24, 2005 at 12:56 UTC
DB is more than able to handle your load(around 20-30 requests/s) I've tested my own multi threaded in-memory storage caching / recursive / authoritative name server (10% finish as of now) with shared variables being able to retrieve a 512bytes variable for >1,000,000 times/s running in a vps, together with the overhead of seeking the right memory pointer to access, recv() and send(), its still enough to saturate a 10mbit line (consider i'm in a vps). in my layer 5 dns packets load balancer the figure is doubled in my benchmark accessing/writing shared variables is 20% slower than private ones, you will only start to notice the difference after like 500,000 access. the key to sucessfully using in memory db is an efficient data structure to minimize overhead of accessing and writting to memory, building indexes to help seek the desired data. the worst case is to write to memory exactly the format stored in your disk. it took me a couple of days just to figure out the best data structure (that i know of) for my app also my humble memory usage benchmark shows that multi dmensional array saves ~5% memory over single dimensional, instead of `$array[0]='xxx' $array[1]='xxx' . .` [download] i prefer to write it `$array->[0][0]='xxx' $array->[0][1]='xxx' $array->[1][0]='abc' . .` [download] the above is still not the best practice (at least what appears in perl), if you have tons of short strings like `'xxx' 'abc'` [download] to store, group them togther in a scalar `'xxxabc'` [download] and make use of substr() to access your range of bytes helps reduce 90% of memory consumption, the above example is still not what to be considerd production level memory storage solution, the more optimistic result i got is to store a set of data every 500+ bytes (not either 1024, 2048, 4096..etc in perl), thus makes it `'abc123xxxabc123xxx..........'` [download] i managed to only take up 65MB for storing 50MB data from disk, and the access speed is not affected by the size of your in memory DB at all. ability to handling some structured integers in bits also helps alot using in-memory db for just 20-30 requests/s is simply overkill and timewasting, you probably want mod_perl or custom built server daemon instead, mysql run on average system should handle 10 times your load :) 20050724 Edit by ysth: p, code tags	[reply] [d/l] [select]
Re^2: Saving big blessed hashes to disk by b888 (Beadle) on Jul 25, 2005 at 07:50 UTC
...20-40 requests per second from users, and each of these requests* will make 10-30 requests to some db..* 200-1200 for now, and will grow even more in nearest future. Sure, mysql is a good thing. But it just can't resolve my situation (tested already). Why to keep data in DB/files if it's stand alone application? The question for which I'm seeking answer is "how to get this data from application and store to disk/db/other memory". Thanks anyway. Got know some new information after reading :)	[reply]
Re^3: Saving big blessed hashes to disk by lwknet (Initiate) on Jul 25, 2005 at 20:23 UTC
you said its a stand alone app, if it is your own writeen server daemon or any kind of daemonized app, it is easier to "get away" with your considered slow mysql than CGI/mod_perl rely on apache. sharing variables in perl ithreads is pretty easy, look for "threads", "Thread::shared" modules from cpan. remember variables in perl never get cleaned even after undef/delete, so, use hashes with care	[reply]


Keep It Simple, Stupid
	PerlMonks