Re: How to share huge data structure between threads?
by diotalevi (Canon) on Jan 10, 2003 at 14:43 UTC
|
BerkeleyDB (http://www.sleepycat.com) is well suited to such an application. Unfortunately the perl module is really light on documentation. I'll provide a very quick example here but you'll really want to read the documentation on the database web site. Since the perl module is based on the C API you'll want to read the C API documentation and then just where it uses some C code, pretend it's perl code. The one caveat is that I never use the tie interface. All it does is call the object oriented interface anyway so I save a method call and just use the database as it's designed to be used. I have an example of object oriented (though not using the CDB features) BerkeleyDB up at http://www.greentechnologist.org/tiger/unpack.pl and http://www.greentechnologist.org/tiger/graph.pl. The CDB features just "happen" if you enable them.
use strict;
use warnings;
use BerkeleyDB;
my $env = get_environment();
my $db = BerkeleyDB::Btree->new (
-Filename => 'my_file.db',
-Flags => DB_CREATE,
-Env => $env
) or die
"Couldn't open database at my_file.db: $BerkeleyDB::Error";
# the database now supports concurrant access. You'd
# just open it in each thread and use it. See
# http://www.sleepycat.com/docs/ref/cam/intro.html
# for info on the concurrant system.
# You can also do nested transactions and logging. See
# http://www.sleepycat.com/docs/ref/transapp/intro.html
# and continue next otherwise just read the docs from the
# table of contents.
sub get_environment {
BerkeleyDB::Env->new (
-Flags => DB_CREATE |
DB_INIT_MPOOL |
DB_INIT_CDB
) or die
"Couldn't initialize BerkeleyDB environment: $BerkeleyDB::Error";
}
Update I should add that the SleepyCat documentation explicitly notes that BerkeleyDB's concurrant access modes work correctly across threads. I posted a code example for multi process access - your multi-threaded example should read similarly though there's no real reason you should need threading given your specified requirements. Update I didn't know the perl module BerkeleyDB wasn't thread safe. The underlying library is. So if you're to follow my suggestion then probably you want multiple processes.
Fun Fun Fun in the Fluffy Chair | [reply] [d/l] |
Re: How to share huge data structure between threads?
by djantzen (Priest) on Jan 10, 2003 at 15:04 UTC
|
Implicit sharing of nested structures is prohibited because it creates the potential for accidential sharing of private data. Since the ithreads model is predicated upon complete separation of all data by default, to allow the capacity to implicitly share references within shared parent structures is to open the door to accidental corruption of data. From perlthrtut
use threads;
use threads::shared;
my $var = 1;
my $svar : shared = 2;
my %hash : shared;
... create some threads ...
$hash{a} = 1; # all threads see exists($hash{a}) and $hash{a} ==
+ 1
$hash{a} = $var # okay - copy-by-value: same effect as previous
$hash{a} = $svar # okay - copy-by-value: same effect as previous
$hash{a} = \$svar # okay - a reference to a shared variable
$hash{a} = \$var # This will die
delete $hash{a} # okay - all threads will see !exists($hash{a})
So the solution using threads is to take references to the things you wish to share at each level of a parent structure and to share them on a case by case basis. In other words, you must explicitly share not only the parent reference, but every reference contained therein.
Here's some example code of a basic object with shared members:
use strict;
use warnings;
package Foo;
sub new {
my ($class, $arg) = @_;
my $this = bless {}, $class;
$this->{args} = undef;
return $this;
}
sub set {
my ($this, $arg) = @_;
$this->{args}[0] = $arg; # setting an entry in a shared array refe
+rence
}
1;
# End of the module, and now a test script
use strict;
use warnings;
use Foo;
use threads;
use threads::shared;
my $foo = new Foo();
my $nested_array = [];
my $nested_string = 'bar';
share($foo);
share($nested_array);
share($nested_string);
$foo->{args} = $nested_array; # set the shared array reference
# pass in a reference to the shared scalar
my $thr1 = threads->create(sub { $foo->set(\$nested_string) });
<Update>
# If in Foo::set we manually set the argument passed, say, to 'quux',
# the object will contain that string rather than 'bar',
# proof that we do indeed have a shared nested reference.
</Update>
$thr1->join();
print $foo->{args}[0];
It's a bother to do this, but it's better than accidental trampling of data. Hope this helps.
| [reply] [d/l] [select] |
Re: How to share huge data structure between threads?
by PodMaster (Abbot) on Jan 10, 2003 at 14:53 UTC
|
Yes and no.
DB_File.pm is not thread safe.
Neither is BerkleyDB.pm
The strategy with DB_File is to blessed(%hash)->flush after writing, and to retie before reading to ensure you got the latest data.
This will work fine but only if you use a newer version of BerkleyDB (anything about 2.5 will work fine with this technique).
If you want better transaction control, use BerkeleyDB.pm, and you got access to the full api (just go buck wild).
You other choice to consider is DBD::SQLite.
If any of this is too slow for you, you can always use Cache::Cache
MJD says you
can't just make shit up and expect the computer to know what you mean, retardo!
** The Third rule of perl club is a statement of fact: pod is sexy.
|
| [reply] |
Re: How to share huge data structure between threads?
by dragonchild (Archbishop) on Jan 10, 2003 at 15:12 UTC
|
Here's a few stupid questions:
- Why are you using threads instead of processes? Apache's children are processes and it's extremely robust. Apache doesn't necessarily have to serve HTML, either. It's a CGI server which can serve anything you want. And, Perl can be tightly intergrated into it.
- Why not set up the shared datastructure as a SOAP process and have your children communicate with it? That way, you can even have your objects on another server and still be ok.
------ We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement. | [reply] |
Re: How to share huge data structure between threads?
by broquaint (Abbot) on Jan 10, 2003 at 15:30 UTC
|
My problem is that threads::shared can't share complex data structures and objects. How can this be solved?
I don't advocate this solution nor am I proud of it, but ...
use threads;
use threads::shared;
use Devel::Pointer;
{
package foo;
sub new { bless [rand 100] }
sub blah { print "in blah()\n" }
}
my $obj = foo->new();
$obj->blah();
print "$obj: @$obj\n";
my $o : shared = address_of($obj);
$t = threads->new(sub {
print "\tin thread\n\t";
my $obj2 = deref($o);
$obj2->blah();
print "\t$obj2: @$obj2\n";
});
$t->join;
__output__
in blah()
foo=ARRAY(0x804beec): 43.5769482822256
in thread
in blah()
foo=ARRAY(0x804beec): 43.5769482822256
Now just look into the little memory-wiping stick ... *flash*.
HTH
_________ broquaint | [reply] [d/l] |
|
| [reply] |
Re: How to share huge data structure between threads?
by LogicalChaos (Beadle) on Jan 10, 2003 at 18:09 UTC
|
Well, you don't say how big huge is, so...Have you looked into IPC::Shareable? You can tie your hash to the shared memory region, and then just make sure you lock it appropriatly before reading/writing. I use this in multi process (not threads) programs and it works quite well for me.If the standard shared memory segment size is too small (32Mb?), you can increase it runtime by adjusting /proc/sys/kernel/shmmax or re-building the kernel
Cheers,
Rob | [reply] |
|
Well, you don't say how big huge is, so...
Server I'm testing on have 3GB RAM which will be probably insufficient for final application. Current size of the test data I want to share is about 600MB (after loading to perl). I don't think that IPC::Shareable can fit this requirements.
| [reply] |
|
| [reply] |
|
OUCH... Seems like a real dB is the only way to go? What sort of performance are you needing? Do you have large quantities of keys or data associated with the keys?
Have you seen Tie::DBI? I've not used it, but it appears interesting, and might be a quick fit to your application. Please post your eventual solution back to this thread, as I'm curious what you come up with.
Luck,
Chaos
| [reply] |
|
|
|
As a sidenote, I just got burned at IPC::Shareable. Not badly. I just failed to RTFM, and discovered the rather hard way the 64K default size of shared memory "segments" or "partitions" or whatever. Read the manual, and look at the size option when tie-ing.
| [reply] |
|
| [reply] |