Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses

Re^3: Multiple write locking for BerkeleyDB

by sgifford (Prior)
on Apr 23, 2008 at 21:43 UTC ( #682505=note: print w/replies, xml ) Need Help??

in reply to Re^2: Multiple write locking for BerkeleyDB
in thread Multiple write locking for BerkeleyDB

Those are quite strong words for a post that includes no benchmarks. I put together three small test programs, one using MySQL, another using SysV semaphores and shared memory, and another using mmap and a gcc-specific atomic_add operation. On the machine where I ran the code MySQL could increment a counter in a MyISAM table about 2200 times/second, or in a memory table about 3500 times/second. Using SysV IPC, I could increment about 15,540 times/second, not quite 5 times faster than the memory table. Using mmap and system-specific atomic locking instructions from C++, I can increment about 9.6 million times/second, which is about 2700 times faster than a MySQL memory table. With 3 writers, MySQL and SysV take about 3 times as long for all 3 to finish, so they are serialized but not penalized too badly for the lock contention; the mmap+atomic_add version actually gets faster (12.2M times/second), because it can run on two processors at the same time. So there are definitely performance advantages to doing a bit of the work yourself.

Now, if the OP's question hadn't been about performance, that probably wouldn't matter; you're right that SysV IPC is rarely used, and the MySQL code is much easier to understand and maintain. But his question was in fact about performance, and in a later post dino states specifically that the performance of MySQL was not fast enough, so advising him to use it is particularly unhelpful. Also, there are certainly more modern forms of IPC, but none that have builtin support in Perl.

Here is the code I used. If you have anything faster, please post it along with benchmarks.

Update: Fixed some errors in benchmarks (there was no row in MySQL, so the UPDATE statements weren't doing anything. Added another test with mmap+atomic_add. Fixed a typo., with SysV IPC
#!/usr/bin/perl use warnings; use strict; use IPC::SysV qw(IPC_CREAT ftok SEM_UNDO); use IPC::Semaphore; my $semtok = ftok($0,1); my $shmtok = ftok($0,2); my $sem = IPC::Semaphore->new($semtok, 1, 0700 | IPC_CREAT) or die "Couldn't create semaphore: $!\n"; my $shmid = shmget($shmtok,4,0700 | IPC_CREAT) or die "Couldn't create shm: $!\n"; if ($ENV{RESET}) { $sem->setval(0,0); my $buf = pack("L",0); shmwrite($shmid, $buf, 0, 4) or die "shmwrite failed: $!\n"; if (!$ARGV[0]) { exit(0); } } foreach my $i(1..$ARGV[0]||100) { my $r = add(1); if ($ENV{VERBOSE}) { print $r,"\n"; } } sub add { my($add) = @_; # Lock: wait for 0, then semaphore up $sem->op(0, 0, 0, 0, 1, 0) or die "semaphore lock failed: $!\n"; # Read counter my $buf; shmread($shmid, $buf, 0, 4) or die "shmread failed: $!\n"; my $val = unpack("L",$buf); # Increment $val += $add; # Write it back $buf = pack("L",$val); shmwrite($shmid, $buf, 0, 4) or die "shmwrite failed: $!\n"; # Now unlock; semaphore down $sem->op(0, -1, 0) or die "semaphore unlock failed: $!\n"; return $val; }, with MySQL. Use CREATE TABLE counter (count int); to create the table, then INSERT INTO counter VALUES (0); to put a value in it.
#!/usr/bin/perl use warnings; use strict; use DBI; our $dbh = DBI->connect('DBI:mysql:database=test','db_user','db_pass') or die "Couldn't connect to db\n"; our $sth = $dbh->prepare("UPDATE counter SET count = count + ?") or die "Couldn't prepare statement\n"; foreach my $i(1..$ARGV[0]||100) { my $r = add(1); } sub add { $sth->execute($_[0]) or die "Couldn't to SQL: $!\n"; }
counter3.C, mmap and atomic_add in C++ (with a little work could be done in Perl/C++ hybrid with Inline::CPP)
#include <sys/mman.h> #include <stdio.h> #include <stdlib.h> #include <sys/stat.h> #include <fcntl.h> #include <iostream> #include <bits/atomicity.h> using __gnu_cxx::__atomic_add; using __gnu_cxx::__exchange_and_add; using namespace std; void die(char *msg) { perror(msg); exit(0); } void add(volatile _Atomic_word *to, int amt) { __atomic_add(to, amt); } int main(int argc, char *argv[]) { if (argc < 2) { die("bad usage"); } int mmap_fd = open(argv[1], O_RDWR); if (mmap_fd < 0) die("couldn't open mmap"); void *mem = mmap(NULL, sizeof(_Atomic_word), PROT_WRITE, MAP_SHARED, + mmap_fd, 0); if (mem == NULL) die("couldn't mmap file"); volatile _Atomic_word *counter = (volatile _Atomic_word *)mem; int count = atoi(argv[2]); for(int i=0;i<count;i++) { add(counter, 1); } cout << "counter is now " << *counter << endl; }

Replies are listed 'Best First'.
Re^4: Multiple write locking for BerkeleyDB
by samtregar (Abbot) on Apr 23, 2008 at 23:00 UTC
    Holy cow, that's a lot of code for $counter++! I hate SysV IPC so, so much.

    Is it really so fast though? MySQL is storing that data on disk and it's only 4x slower! Is disk 4x slower than memory? No, it's much, much slower. SysV. IPC. Sucks.

    If you feel like running more benchmarks for me, change the MySQL one to use a MEMORY table. Then at least you won't have disk writes dragging MySQL down.


      It really isn't any more complex, it's just that DBI and mysql wrap most of that up for you. Any synchronized solution will have to take some kind of mutex, read and write the counter, then release the mutex, which is all my code does. If you were to use strace or gdb to step through the code involved for both cases (including the mysql server), you'd find that SysV is much simpler; it's just that you have to do more of the coding yourself, because it's not as widely used.

      As far as memory tables, I had an error in my benchmark, I was running it with no rows in the table, so no updates were happening at all. I have corrected it in the original post, and also tried a MySQL memory table; SysV is about 4.7 times faster than a memory table. I also added a benchmark using mmap and an atomic add, which is much, much faster than any of the other solutions.

      And as to the amount of work MySQL and SysV are doing behind the scenes, I don't see how it matters much unless the OP needs every update written immediately to disk. Otherwise MySQL is just doing extra work that the OP doesn't need.

        Thats a lot of useful information, thanks. I'm a little puzzled however of how to get separate processes to use the same shared memory. With fork I can use some form of handle which is inherited, but how does it work when the writer processes have been started separately?
        I had a look IPC::MM as it allows the creation of shared hashes (not using storable) but it has the above problem (I think).
        It really isn't any more complex, it's just that DBI and mysql wrap most of that up for you.

        That's an odd definition of complexity you've got there. What would you think of the equivalent solution in assembler? Or constructed using syscall() instead of the shm*() routines? Obviously these would do the same work for you, it's just that GCC "wraps most of that up for you."

        It's interesting to see that SysV semaphores perform pretty well. It doesn't match my experience with SysV shared memory, which was about as slow as just using disk, but I guess that's apples and oranges. And let's not even talk about the arbitrary kernel-level limits on this stuff (how many, how much storage, etc).... It might work great on Linux, but porting to BSD or Solaris where the defaults are much different is a guaranteed pain.


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://682505]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2023-09-25 08:41 GMT
Find Nodes?
    Voting Booth?

    No recent polls found