Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

threads::shared seems to kill performance

by Jacobs (Novice)
on Jul 17, 2013 at 21:09 UTC ( [id://1044903]=perlquestion: print w/replies, xml ) Need Help??

Jacobs has asked for the wisdom of the Perl Monks concerning the following question:

Hello wise monks, I'm trying to increase performance of my program (which uses a huge 3-dimensional hash - around 240MB in RAM) by moving it to a threaded model.

I though I'd explicitly share the big hash and then access the data from each thread (the operations I do are read only, so I'm not worried about individual threads conflicting).

However the process of creating the big shared hash takes ages for some reason - I've tried these two simplified versions for comparison:

my %data; foreach my $x (1..5000) { $data{$x} = {} unless $data{$x}; foreach my $y (1..1000) { $data{$x}{$y} = {} unless $data{$x}{$y}; } } real 0m4.075s user 0m3.767s sys 0m0.289s
vs the shared one:
use threads; use threads::shared; my %data:shared; foreach my $x (1..5000) { $data{$x} = &share( {} ) unless $data{$x}; foreach my $y (1..1000) { $data{$x}{$y} = &share( {} ) unless $data{$x}{$y}; } } real 1m4.984s user 1m4.211s ! that's ~16x times slower that the non-shared case sys 0m0.540s

Is there something wrong with my code or is this performance decrease simply an inevitable cost of sharing?

Replies are listed 'Best First'.
Re: threads::shared seems to kill performance
by dave_the_m (Monsignor) on Jul 17, 2013 at 22:42 UTC
    threads::shared variables are *very* slow; you should share little, and access it not a lot.

    The implementation essentially does a similar thing to tieing (except that it's implemented in XS rather than perl); so

    my %hash : shared; ... $x = $hash{foo};
    is a bit like
    sub threads::shared::FETCH { lock $Some:Global:lock_var; return $Some::Shared::Space::hash{$_[0]}; } my %hash; tie %hash, 'threads::shared'; ...
    Note that each thread has its own copy of the 'tied' hash; accessing it causes a global lock to be set, then an entry from the 'real' hash is copied to that thread.

    Dave.

Re: threads::shared seems to kill performance
by BrowserUk (Patriarch) on Jul 18, 2013 at 00:33 UTC

    Yes, shared aggregates are considerably slower than non-shared.

    But try it this way and it'll be about 2/3rds less slow:

    use threads; use threads::shared; my %hashOf1000SharedHashes = map{ $_ => &share({}) } 1 .. 1000; my %data:shared; foreach my $x (1..5000) { $data{$x} = shared_clone( \%hashOf1000SharedHashes ); } undef %hashOf1000SharedHashes;

    That said, building a 2D HoH of empty hashes (with consecutive numerical indices?) doesn't seem very useful.

    Presumably that structure will need to be populated at some point -- and with that amount of data it must becoming in from outside the program -- and once you add the IO to fetch the data into the mix, the cost of making the data shared will pale into insignificance.

    If instead of building a huge, empty shared data structure, and then populating it, (which will take considerable further time), you shared and populated it in one pass, you'd save considerable time and the sharing costs would almost disappear in amongst the IO costs.

    Tell us more about what goes in this monster, where that comes from; and how it is used and we'll probably be able to help you save a lot of time.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Hello BrowserUK, threading master of masters from what I hear! Thank you for your response.

      I'm aware I'm probably breaking several laws and killing small kittens in the process by allocating hash array this big.

      Originally the data comes from a SQLite database. There's on huge table that's tied via 2 levels of parameters - say: owner, date, some_data (with <owner,date being> unique and set of owners relatively small) - and with loading this into those hashes, I'm trying to introduce some structure to the data so that I can later access it from my program in a way I can easily understand and work with ($data{user}{date}[]).

      Strangely loading the data from the database doesn't have as big of an impact on the performance as the sharing does. In my real life tests - where I in fact do initialize the hash and populate it in one pass as you suggest - the loading from DB and populating the hash (with significantly reduced set of data) took about 2s. Once I added the sharing (in a way similar to my example above), it took about 26s.

        Originally the data comes from a SQLite database....

        Then I very strongly advise against taking the data out of the db and putting it into a hash.

        Not only will doing so take considerable time and substantial space, although for read-only use you won't need locking, there is no way to turn off the locking Perl uses to protect its internals, and that will bring your application to a crawl.

        Instead, share the db handle and create statement handles for your queries. Whilst I haven't done this personally (yet), according to this, the default 'serialized' mode of operation means that you don't even need to do user locking as the DB will take care of that for you.

        If you create/clone your DB as an in-memory DB, after you've spawned your threads; then you will avoid the duplication of that DB and the performance should be on a par with, and potentially faster than a shared hash.

        When I get time, which may not be soon, I intend to test this scenario for myself as I think it might be a good solution to sharing large amounts of data between threads. Something Perl definitely needs.

        It may even be possible to wrap it over in a tied hash to simplify the programmers view of the DB without incurring the high overheads of threads::shared (That's very speculative!).

        In any case, as your data is already in a DB; don't take it out and put it in shared hashes. That just doesn't make sense. Just load it into memory after you threads are spawned; and then set the dbh into a shared variable where the threads can get access to it.

        At least, that is what I would (and will) try.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: threads::shared seems to kill performance
by Preceptor (Deacon) on Jul 17, 2013 at 21:44 UTC

    Hmm, well, I'd sort of expect 5,000,000 &share calls to take a reasonable amount of time, yes. Hashes - particularly multidimensional ones - don't work well with thread::shared. What you've got is essentially a fudge that creates a lot of separate anonymous hashes, and links them together.

    However if - as you say - your data is read only from your threads, you might not need to do that - if you initialise prior to instantiating your threads, they'll take a copy of your global namespace anyway. You just won't be able to modify it within the thread (or technically - you can, but it won't replicate to other threads).

      Thank you. I considered not-sharing, but that would effectively mean each thread would be setup with a copy of the original 240MB array, would it not?

      I was afraid this would quickly kill my memory, but thinking about it now, isn't there a chance this would be copy on write only? And thus even 1000 threads would (considering I only do reads) still only use 240MBs of memory?

        Couldn't say myself, without trying it. I know some modes of parallel processing take memory copy-on-write, and others don't. I'm pretty sure the Unix 'fork' does that, for example. I've never had occasion to check whether threads do too.

        It may not be viable, but depending on frequency of reading array, you might find you can have a 'handler' thread, that services requests for data from the hash

        Otherwise - your code is all about initially creating the hash. How does it perform once that's finished? It may be worth the overhead.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1044903]
Approved by BrowserUk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2024-04-19 23:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found