Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^5: threads::shared seems to kill performance (Workaround).

by BrowserUk (Patriarch)
on Jul 19, 2013 at 08:24 UTC ( [id://1045292]=note: print w/replies, xml ) Need Help??


in reply to Re^4: threads::shared seems to kill performance
in thread threads::shared seems to kill performance

Okay. I found an obscure detail(Section entitled "In-memory Databases And Shared Cache" ) in the sqlite documentation that allows me to work around DBI's brokeness.

This demonstrates it:

#! perl -slw use strict; use threads; use threads::shared; use threads::Q; use Time::HiRes qw[ time ]; use DBI; use constant { CONNECT=> 'dbi:SQLite:dbname=file:memdb?mode=memory&cache=shared', CREATE => 'create table if not exists DB ( ID integer(8),' . join(',', map "F$_ text(15)", 1..9) . ')', INSERT => 'insert into DB ( ID , ' . join( ',', map "F$_", 1..9 ) . ') values (' . '?,' x 9 . '?) +', INDEX => 'create index if not exists I1 on DB ( ID )', QUERY => 'select * from DB where ID = ?', }; sub thread { my $tid = threads->tid; my( $Q ) = @_; $Q->dq; ## Wait for DB my $dbh = DBI->connect( CONNECT, '', '' ) or die DBI::errstr; my $sth = $dbh->prepare( QUERY ) or die DBI->errstr; while( my $id = $Q->dq ) { $sth->execute( $id ) or die DBI::errstr; my $r = $sth->fetch or warn( "No data for $id" ) and next; ## do something with record. printf "[$tid] %5u %s %s %s %s %s %s %s %s %s\n", @{ $r }; } $sth->finish; $dbh->disconnect; } my @chars = ( 'a'..'z' ); sub dummy { my $n = shift; join '', @chars[ map int( rand @chars ), 1 .. $n ]; } our $T //= 4; our $N //= 100; my $dbh = DBI->connect( CONNECT, '', '', { AutoCommit =>0 } ) or die D +BI::errstr; $dbh->do( 'PRAGMA synchronous = off' ); $dbh->do( 'PRAGMA cache_size = 800000' ); $dbh->do( CREATE ) or die DBI::errstr; my $ins = $dbh->prepare( INSERT ) or die DBI->errstr; for my $n ( 1 .. $N ) { my @fields = ( $n, map dummy( 15 ), 1 .. 9 ); $ins->execute( @fields )or die $ins->errstr; $n %100 or $dbh->commit } $ins->finish; $dbh->commit; #$dbh->do( INDEX ) or die DBI::errstr; my $sth = $dbh->prepare( QUERY ) or die DBI->errstr; for my $id ( 1 .. 5 ) { $sth->execute( $id ) or die DBI::errstr; my $r = $sth->fetch() or warn( "No data for $id" ) and next; ## do something with record. printf "[main] %5u %s %s %s %s %s %s %s %s %s\n", @{ $r }; } $sth->finish; print $dbh->selectrow_array( 'SELECT count(*) from DB' ); my $Q = threads::Q->new( 10 ); my @threads = map threads->create( \&thread, $Q, shared_clone( $dbh ) +), 1 .. $T; $Q->nq( ('Wakey wakey') x $T ); $Q->nq( $_ ) for 1 .. 20; ## $N; $Q->nq( (undef) x $T ); $_->join for @threads; $dbh->disconnect; unlink 'file'; ## without this, the memory db will persist in a appare +ntly empty file???? ## That might be turned into an advantage????

There is however some weirdness associated with this. (See the comments on the last two lines above).

Basically, the trick to making multiple DBI handles (in different threads) refer to the same in memory database is using a dbname of the form:

file:memdb?mode=memory&cache=shared

The weirdness is that although this is a "memory db", a file (in this example called 'file' in the local directory) is created. It will always be 0 bytes and running wc on it confirms it has no content.

But, run the above program a second time and it will add to the number of records in the table. IT WILL DOUBLE IN SIZE!

But if you delete that empty file, the records from the first run disappear!

Not sure what to do about that other than just delete the file.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^6: threads::shared seems to kill performance (Workaround).
by MidLifeXis (Monsignor) on Jul 19, 2013 at 17:22 UTC

      Great call!

      C:\test>dir file 19/07/2013 19:49 0 file C:\test>streams file Streams v1.56 - Enumerate alternate NTFS data streams Copyright (C) 1999-2007 Mark Russinovich Sysinternals - www.sysinternals.com C:\test\file: :memdb2?mode=memory&cache=shared:$DATA 347136

      Doesn't explain why its not being deleted as (I think) it should.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        I wonder (have not tested) if the ADS is being created, forcing creation of the "normal" file, and at the end, the ADS is being attempted to be deleted, but failing, leaving the data for the next run. ISTR needing to delete the main file (or perhaps some special tool besides unlink) to remove the ADS.

        --MidLifeXis

Re^6: threads::shared seems to kill performance (Workaround).
by Jacobs (Novice) on Jul 24, 2013 at 18:55 UTC

    So I finally got some time to play with this and I still can't wrap my head around it.

    If I understand it correctly, the trick is in the fact that you:

    1. load the whole DB as a memdb with shared cache
    2. then use the
      shared_clone( $dbh )
      to get this memory where the DB is loaded shared with all the threads
    3. and then inside the thread create a new DB handler, but thanks to the fact you're using the same memdb with shared cache, it reuses the same memory as the DB handler in the main thread

    Is that even remotely correct? :-) And if so:

    1. Why do the shared_clone( $dbh )? Isn't the shared memdb enough for the threads to have access to the same memory? I've been trying to somehow see the difference between the two scenarios (with and without passing the shared_clone( $dbh )) and I'm probably not measuring this right, but the memory usage (vm_stat) seems to be similar in both cases.
    2. What's the purpose of the
      $Q->nq( ('Wakey wakey') x $T ); ... $Q->dq; ## Wait for DB
      ? Why pass the string to the thread and then just dequeue it? And what's the relation to waiting for DB?

      If I understand it correctly ... then use the shared_clone( $dbh ) ... Is that even remotely correct?

      No. Look again and you'll see that I don't use the share_clone( $dbh ) at all within the threads. I never actually access the second parameter I pass:

      my @threads = map threads->create( \&thread, $Q, shared_clone( $dbh ) +), 1 .. $T; #............................................^1, ^^^^^^^^^ 2 ^^^^^^^^

      within the thread procedures;

      sub thread { my $tid = threads->tid; my( $Q ) = @_;

      I'm afraid that this is a 'left-over' -- ie. I should have deleted it before posting; sorry -- from earlier experiments that lead to my discovering the need for the workaround.

      The reason we need a workaround is because sqlite :memory: dbs are only visible to the connection that creates them.

      But if you share a $dbi, the contents of the dbi (the C/XS bits that the sqlite libraries use) gets mangled internally by DBI such that those libraries see the shared connection from different threads as being different connections; thus the threads see different :memory: dbs.

      The workaround is to use the uri-form of the connect string + the modifiers mode=memory&cache=shared to cause sqlite to connect all the (new; unshared) connects made in each of the threads to the same db.

      Why do the shared_clone( $dbh )? Isn't the shared memdb enough for the threads to have access to the same memory?

      Not applicable :)

      What's the purpose of the $Q->nq( ('Wakey wakey') x $T ); ... $Q->dq; ## Wait for DB ?

      Why pass the string to the thread and then just dequeue it? And what's the relation to waiting for DB?

      Again, this is a left-over from earlier experiments. Its purpose was to block the threads whilst I created the db in the main code as originally I was creating the threads before I created the db. The idea was to prevent the memory used by the DB being cloned when the threads are created.

      However, as I'm no longer using a shared connection handle, that precaution is no longer relevant. So, the nqing of the strings and the associated dqing of them can be safely removed.

      Here's the latest for of the code with a few tweaks and some timing code added:

      Finally, I've still to resolve the undeleted file problem. The fact that a file is being created at all is somewhat disconcerting as it mean that I'm not getting the (purely) memory db I was hoping for; and could mean that performance isn't what it should be as a result. I did try contacting the mailing list about this, but got no useful response. I haven't had time/motivation to pursue it further.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Ah, perfect, now I get it! Thank you very much :-)

        That file thing is a bit strange indeed, I hope there's some solution to that.

Re^6: threads::shared seems to kill performance (Workaround).
by Jacobs (Novice) on Jul 19, 2013 at 15:42 UTC

    Wow, this is quite something. I'm trying to absorb all that and I have probably a very stupid question at the beginning - what is threads::Q? I can't find that on CPAN or anywhere else.

      what is threads::Q? I can't find that on CPAN or anywhere else.

      Its a module of my own writing that I've never packaged.

      It is essentially my own version of Thread::Queue minus a lot of that modules junk, plus a little extra. Namely, you specify a size for the queue and it self limits to that size, blocking senders when it gets to full and receivers when empty.

      You can grab a copy of it here.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1045292]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (2)
As of 2024-04-26 03:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found