Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Saving big blessed hashes to disk

by b888 (Beadle)
on Jul 20, 2005 at 09:25 UTC ( [id://476438]=perlquestion: print w/replies, xml ) Need Help??

b888 has asked for the wisdom of the Perl Monks concerning the following question:

Greetings.

While writing some kind of stand-alone application I met problem of reserve saving in-memory data to disk (to recover if smth went wrong). After searching I've found Storable lib. Now I'm using store method to save my data. But since some time this operation takes 2-3 seconds, and I'm afraid will grow even more. Data can't be split to parts to save separately.

The only way that I see now is to make some kind of replicate-daemon which will catch small portions of "what-to-change" and replicate the same data structure.

Can someone suggest me what to do? :)

Replies are listed 'Best First'.
Re: Saving big blessed hashes to disk
by rinceWind (Monsignor) on Jul 20, 2005 at 11:00 UTC

    I think that you've discovered that Storable provides a persistence mechanism that enables you to dump and load data structures. You are also looking for something scaleable - you were worried about the time to store the data; having the whole data structure in memory is going to eat into your process and operating system resources if it gets large.

    I can suggest two related approaches: objects and tieing. An object can be a placeholder for some point in your arbitrarily large data. If your application keeps to the OO encapsulation rules, you are only holding the data in memory that you actually need for manipulation; the other parts can still be accessed via method calls. One approach is to hold the data in a database, and use Class::DBI as an object persistence layer to turn rows into objects.

    Tieing provides an alternative interface your data, and can make it look like an array or a hash (or a filehandle, etc.) When the application performs operations on the tied variable, method calls happen to do the sleight of hand and get your data. The application does not need to know that this is happening - it just sees an array (or hash). See perldoc perltie for more on this.

    Hope this helps

    --

    Oh Lord, won’t you burn me a Knoppix CD ?
    My friends all rate Windows, I must disagree.
    Your powers of persuasion will set them all free,
    So oh Lord, won’t you burn me a Knoppix CD ?
    (Missquoting Janis Joplin)

Re: Saving big blessed hashes to disk
by rev_1318 (Chaplain) on Jul 20, 2005 at 10:14 UTC
    If your data is that big and that important, I'd use a database to store and retrieve it...

    Paul

Re: Saving big blessed hashes to disk
by adrianh (Chancellor) on Jul 20, 2005 at 10:36 UTC
    Now I'm using store method to save my data. But since some time this operation takes 2-3 seconds, and I'm afraid will grow even more. Data can't be split to parts to save separately.

    Two ideas spring to mind:

    • Change your design so that you can split stuff up so you can save/load stuff incrementally (e.g. have the definitive version of your structure on disk and only load what you need to change).
    • Instead of storing the whole data structure, store the changes you make to it. Then you can recover it by replaying the changes.

    Without knowing more details of your particular application it's hard to come up with specific advice :-)

Re: Saving big blessed hashes to disk
by Tanalis (Curate) on Jul 20, 2005 at 09:29 UTC
      I'd probably think about using threads to allow the recovery data to be saved in the background
      Er, don't do that. Shared data in perl ithreads is slow and expensive and difficult.

      Dave.

Re: Saving big blessed hashes to disk
by Your Mother (Archbishop) on Jul 20, 2005 at 17:40 UTC

    I like DB_File and BerkeleyDB for this kind of thing; assuming the approach could possibly match your needs and you'll be able to work out a locking strategy if you need it. It can slow down an application a bit because the in-memory is now on-disk but they're both quite fast and depending on what you're doing you might not even notice any slow down. Make sure to provide for excetion/death handling to clean-up tied dbs, they can be corrupted mysteriously on some platforms when they're not closed properly.

      The main reason why I do not want to use something not in memory - is number of requests per second.

      Well, simply imagine something like chat structure: a lot of users, rooms, private rooms... On every user request application has to get info about

      • session
      • user
      • user state
      • user room
      • ...

      That's something like I'm doing

      The only reasonable approach in this case - keep all data in memory (or shared memory at least), as there are 20-40 requests per second from users. And if each of these requests will make 10-30 requests to some db - even if this will work, i suppose it will not be very stable or scalable.

      p.s. Thanks for all answers

        The only reasonable approach in this case - keep all data in memory (or shared memory at least), as there are 20-40 requests per second from users. And if each of these requests will make 10-30 requests to some db - even if this will work, i suppose it will not be very stable or scalable.

        I think you'll be surprised. Databases are very good at this sort of think. Benchmark, don't guess ;-)

        I wrote a simple chat this way, much like the Chatterbox, no rooms, but some autoformatting, sessioning, cookies and such. The trick to making it really zoom for me was making it into a daemonized server so that it was always running and always had its tied handle on the DB_File beneath. It was quite a bit faster than the relational DB that drove the page around/above it (even using some of the same DB code to handle users/sessions). The chat was iframed so it could reload independently of the regular site.

        The trick then is the daemonizing code and handling the HTTP and such. Even for a queue of lines of chat it turns into a lot of code (there is also POE and more already done out there for mini-applications). I've recommended this book before: "Network Programming with Perl" ISBN 0201615711. I'd never done that type of code before but managed to write a chat-daemon server that never needed restarting in 2 years and ran far faster than I expected.

Re: Saving big blessed hashes to disk
by lwknet (Initiate) on Jul 24, 2005 at 12:56 UTC
    DB is more than able to handle your load(around 20-30 requests/s)

    I've tested my own multi threaded in-memory storage caching / recursive / authoritative name server (10% finish as of now) with shared variables being able to retrieve a 512bytes variable for >1,000,000 times/s running in a vps, together with the overhead of seeking the right memory pointer to access, recv() and send(), its still enough to saturate a 10mbit line (consider i'm in a vps). in my layer 5 dns packets load balancer the figure is doubled

    in my benchmark accessing/writing shared variables is 20% slower than private ones, you will only start to notice the difference after like 500,000 access.

    the key to sucessfully using in memory db is an efficient data structure to minimize overhead of accessing and writting to memory, building indexes to help seek the desired data. the worst case is to write to memory exactly the format stored in your disk. it took me a couple of days just to figure out the best data structure (that i know of) for my app

    also my humble memory usage benchmark shows that multi dmensional array saves ~5% memory over single dimensional, instead of

    $array[0]='xxx' $array[1]='xxx' . .
    i prefer to write it
    $array->[0][0]='xxx' $array->[0][1]='xxx' $array->[1][0]='abc' . .
    the above is still not the best practice (at least what appears in perl), if you have tons of short strings like
    'xxx' 'abc'
    to store, group them togther in a scalar
    'xxxabc'
    and make use of substr() to access your range of bytes helps reduce 90% of memory consumption, the above example is still not what to be considerd production level memory storage solution, the more optimistic result i got is to store a set of data every 500+ bytes (not either 1024, 2048, 4096..etc in perl), thus makes it
    'abc123xxxabc123xxx..........'
    i managed to only take up 65MB for storing 50MB data from disk, and the access speed is not affected by the size of your in memory DB at all. ability to handling some structured integers in bits also helps alot

    using in-memory db for just 20-30 requests/s is simply overkill and timewasting, you probably want mod_perl or custom built server daemon instead, mysql run on average system should handle 10 times your load :)

    20050724 Edit by ysth: p, code tags

      ...20-40 requests per second from users, and each of these requests will make 10-30 requests to some db..
      200-1200 for now, and will grow even more in nearest future.

      Sure, mysql is a good thing. But it just can't resolve my situation (tested already).

      Why to keep data in DB/files if it's stand alone application? The question for which I'm seeking answer is "how to get this data from application and store to disk/db/other memory".

      Thanks anyway. Got know some new information after reading :)

        you said its a stand alone app, if it is your own writeen server daemon or any kind of daemonized app, it is easier to "get away" with your considered slow mysql than CGI/mod_perl rely on apache. sharing variables in perl ithreads is pretty easy, look for "threads", "Thread::shared" modules from cpan. remember variables in perl never get cleaned even after undef/delete, so, use hashes with care

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://476438]
Approved by jbrugger
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (4)
As of 2024-04-26 00:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found