Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^2: Memory overhead of blessed hashes

by LanX (Saint)
on Feb 10, 2021 at 17:58 UTC ( [id://11128191]=note: print w/replies, xml ) Need Help??


in reply to Re: Memory overhead of blessed hashes
in thread Memory overhead of blessed hashes

> I know it's not your code,

I don't know what they did, and I want to avoid another "told you so" situation.²

Just fighting off FUD theories that bless had a memory impact and trying to educate myself.

> Is it really necessary to hold 100k+ in memory at once?

From my understanding: they are building complicated trees (well multi-trees°) within a short time window.

> If so, if the primary attribute of each object were just a path where the serialization of the remainder of the object's guts can be found, you might save space.

That's a good idea.

Tho in my experience are Perl&OS pretty efficient in swapping unused hashes as long as they are small enough.

Of course the performance depends on the frequency you need to access those, but the same applies to your serialization idea.

Hmm ...

Actually this is a good counter argument to insight-out-objects, because class-variables holding data for all objects can't be swapped.

So it's sometimes better to keep rarely used "guts" data inside small hashes at lower nested levels.

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

°) elements can have multiple parents (aggregation semantic)

²) see also "Chuck Norris"-ing code

  • Comment on Re^2: Memory overhead of blessed hashes

Replies are listed 'Best First'.
Re^3: Memory overhead of blessed hashes
by jcb (Parson) on Feb 11, 2021 at 02:40 UTC
    Just fighting off FUD theories that bless had a memory impact and trying to educate myself.

    To be fair, bless does have a small memory impact: packages used for object classes have slightly greater overhead (per-package) and blessed scalars must be upgraded to carry magic (which also adds the STASH pointer), but the per-object overhead for blessed aggregates is zero — AV and HV structures are large enough that they always have the slot for the STASH pointer.

    Actually this is a good counter argument to insight-out-objects, because class-variables holding data for all objects can't be swapped.

    Virtual memory does not know about that — swapping occurs at page granularity regardless of larger structures. If the hash table is large enough, and accesses do not result in scanning the entire table, portions of the hash table can be swapped out by the OS, even if other parts of the table are held in memory due to frequent access. If one SV on a page is frequently accessed, everything else on that page is also kept in memory.

    So it's sometimes better to keep rarely used "guts" data inside small hashes at lower nested levels.

    Your problem here seems to be the fixed per-hash HV overhead, which is a consequence of the existence of many small hashes in your program, whether blessed or plain.

    If you have a relatively small tree node and search/index keys object with a relatively large and generally opaque "data payload" segment, you could use inside-out-objects to reduce the hash overhead for the search/index keys and DBI/SQLite to store the payloads, possibly in an in-memory database, but once you have eliminated the per-object HV overhead, simply serializing the payloads and storing them in one more hash will probably be comparable to using an in-memory SQLite database for much lower overhead. Unless, of course, you can actually move your entire data tree into SQLite and use SQL to access it, or the payloads really are a large part of the problem and SQLite allows you to move them out to disk while keeping the tree structure in the inside-out objects.

      > If the hash table is large enough, and accesses do not result in scanning the entire table, portions of the hash table can be swapped out by the OS, even if other parts of the table are held in memory due to frequent access.

      Well in theory, but if it comes to hashes that's pretty unlikely.

      If one part of a hash is much more frequently accessed than another one, then the hashing function can't be very good.

      Or it's always only the same key. :)

      Anyway I once had stunning results after transforming a giant hash into a two tier HoH and letting the algorithm concentrating on always a very small group of second tier hashes.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11128191]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2024-04-18 05:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found