http://qs321.pair.com?node_id=682443

dino has asked for the wisdom of the Perl Monks concerning the following question:

I'm having problems in coding an application that updates a set of counters from multiple sources (tcpdump byte counts). Originaly I used flock on separate files, but that was too slow. The next trial was with DB_File, that worked well, but I hear there is an issue with locking/corruption. I then tried dbi connecting to a mysql db, but unfortunately and that was also too slow. I finally tried BerkeleyDB. I was able to use CDS, but it doesn't work for multiple writers (it deadlocked). I've yet to determine how to manually lock an update in such an environment. Any suggestions? Anyone seen a perl example out there? I've had a look at the Oracle site, but I'm finding the C a bit tricky.

Update: Apr 27
There certainly have been many suggestions and thanks to you all. I've done some testing of Cache::FastMmap and it appears to meet most of my needs. I'll also have a look at the source for BerkeleyDB to get a better understanding of how it does locking.

Replies are listed 'Best First'.
Re: Multiple write locking for BerkeleyDB
by perrin (Chancellor) on Apr 23, 2008 at 18:30 UTC
    DB_File works, but you do have to use flock and re-open the db each time to flush caches. MLDBM::Sync does this for you. I'm surprised MySQL didn't work for you. Did you run it on localhost and use localhost as the domain name when connecting? That tells it to use unix domain sockets which is much faster than using TCP sockets.

    BerkeleyDB can definitely do what you want, but I would suggest you skip the multi-locking and just let it use a global lock. The performance you get from using it that way is good enough for nearly any perl use and it avoids deadlocking. There's some sample code using it this way in a benchmark script here.

      Thanks for the url. Wrt BerkeleyDB I was unable to find any Read Modify Write locking code. It looks like CDS was used in an automatic fashion. I'm guessing I need to invoke lock_vec and its friends but I'm not sure how.
        No, that's just it: BDB handles the locking for you. You don't need to do any explicit locking.
Re: Multiple write locking for BerkeleyDB
by pc88mxer (Vicar) on Apr 23, 2008 at 18:25 UTC
    How many updates a second do you need? Do you need to read the counters for any other reason (other than to increment them)? Are there any other processes accessing this data?

    An alternative approach is to write a simple server which mediates access to the data store. Writers communicate with it via a pipe or socket. It would also allow you to implement complex atomic operations (like incrementing a value) without having to resort to locking.

      Atm I'm getting about 30k updates/min from mysql and the file based approach. DB_File gives around 200k and BerkeleyDB around 150k. On the reader side I'm using cacti and a perl script to total counts and display different traffic flows with greater detail than snmp queries. Read rate can be a lot lower than write rate.
Re: Multiple write locking for BerkeleyDB
by sgifford (Prior) on Apr 23, 2008 at 19:25 UTC
    You could use a SysV IPC to do this. It provides semaphores for locking and persistent shared memory for sharing data. I use it for one project and it is very fast, though I don't recall the benchmark numbers. See perlipc, shmget, and IPC::SysV.

    You could also use Sys::Mmap, but you would have to figure out something else for locking. lockf should work fine; it would surprise me if this were very slow if done carefully.

    Finally, you could write just the counter part in C, and use mmap to store the counters and atomic operations for consistency. Intel's threading building blocks provide a useful way to do this on Intel hardware, but most modern hardware has something like this. Using Inline::C it's not too hard to mix Perl and C.

    Good luck!

      Wow, that's a lot of bad advice for one message!

      I'm amazed to see anyone offering SysV IPC as a solution these days - it was a bad choice years ago when I last used it. So many limitations, so many traps, so much pain, so DAMN SLOW!

      I have no idea what mmap would gain you here aside from shared storage. This is clearly a database problem, so you might as well use one. Most likely BDB and MySQL are both using mmap for you.

      And for god's sake, don't write your own DB in C!

      -sam

        Those are quite strong words for a post that includes no benchmarks. I put together three small test programs, one using MySQL, another using SysV semaphores and shared memory, and another using mmap and a gcc-specific atomic_add operation. On the machine where I ran the code MySQL could increment a counter in a MyISAM table about 2200 times/second, or in a memory table about 3500 times/second. Using SysV IPC, I could increment about 15,540 times/second, not quite 5 times faster than the memory table. Using mmap and system-specific atomic locking instructions from C++, I can increment about 9.6 million times/second, which is about 2700 times faster than a MySQL memory table. With 3 writers, MySQL and SysV take about 3 times as long for all 3 to finish, so they are serialized but not penalized too badly for the lock contention; the mmap+atomic_add version actually gets faster (12.2M times/second), because it can run on two processors at the same time. So there are definitely performance advantages to doing a bit of the work yourself.

        Now, if the OP's question hadn't been about performance, that probably wouldn't matter; you're right that SysV IPC is rarely used, and the MySQL code is much easier to understand and maintain. But his question was in fact about performance, and in a later post dino states specifically that the performance of MySQL was not fast enough, so advising him to use it is particularly unhelpful. Also, there are certainly more modern forms of IPC, but none that have builtin support in Perl.

        Here is the code I used. If you have anything faster, please post it along with benchmarks.

        Update: Fixed some errors in benchmarks (there was no row in MySQL, so the UPDATE statements weren't doing anything. Added another test with mmap+atomic_add. Fixed a typo.

Re: Multiple write locking for BerkeleyDB
by samtregar (Abbot) on Apr 23, 2008 at 20:13 UTC
    How did you code your MySQL solution? What storage type did you use? I bet this will perform great, and I'm pretty sure it's guaranteed atomic for MyISAM and InnoDB (would have to check to be sure):

      UPDATE table SET counter = counter + 1 WHERE counter_id = ?;

    (Assuming you have an index on counter_id of course.)

    You also need to look out for the usual DB-usage traps - make sure you're caching your connection between calls, tune your configuration to make appropriate use of memory, etc.

    -sam

      Well I used dbi and a similar syntax to the above "insert with on duplicate key update, to deal with non existing keys (in this case ips)".
      I'm a pretty inexperienced sqler so its very possible that its not optimal, but the rate was very poor (30k/min) when compared to the Berkeley alternatives.
        Try pre-loading your counters with 0s and just using UPDATE. I bet it's faster than INSERT ON DUPLICATE UPDATE. Assuming you have reasonable hardware 30k/min (500/s) seems pretty poor. But I guess it depends on how many counters there are and how many concurrent connections are trying to write at once.

        -sam

        For simple things like this, using local sockets with MySQL makes a big difference. It will never be as fast as BerkeleyDB, but it should be very fast.
Re: Multiple write locking for BerkeleyDB
by BrowserUk (Patriarch) on Apr 24, 2008 at 06:05 UTC

    (*)Updated units per pc88mxers post below.

    By serialising the counts through a threaded udp server, batching them on input before transfering them to the background writing thread, I managed sustained throughput rates of close to 600k/minute on a single core 2.6 M*GHz machine running both producer and consumer, and with half the bandwidth taken up with a download going on simultaneously:

    Consumer run:

    c:\test>s-udp.pl -MAXBUF=500 Total: 347176 throughput: 9777/sec [max: 9987]

    MAXBUF=500 seems to be about the sweet spot on my machine. I can get higher throughputs by upping the priority of the server process. I was just writing the counts out to a file in the background, but you could probably do batch updates to a db without slowing things much.

    Consumer:

    use strict; use threads; use threads::shared; use Thread::Queue; use Time::HiRes qw[ time usleep ]; use IO::Socket; use List::Util qw[ max ]; $|++; our $PORT ||= 9999; our $FILE ||= 'theLog'; our $LEN ||= 6; our $MAXBUF ||= 1000; my $Q = new Thread::Queue; open my $log, '+> :raw', $FILE or die "$FILE: $!"; my $start = time; my $n :shared = 0; my %log; my $total = 0; async { my $max = 0; my $thru = 0; while( my $in = $Q->dequeue ) { ++$log{ $_ } for split chr(0), $in; sysseek $log, 0, 0; my $logRec = join "\n", map{ $_ . ':' . $log{ $_ } } sort keys + %log; syswrite( $log, $logRec ); $max = max( $max, $thru = int( $n / ( time() - $start ) ) ); printf "\r\t Total: %8d throughput: %5d/sec [max:%5d]", $total, $thru, $max; $total += $n; $n = 0; $start = time(); sleep 1; } }; my $srv = IO::Socket::INET->new( LocalPort => $PORT, Proto => 'udp', ) or die "Socket: $@ : [$^E]"; my $buffer = ''; my $msg; while( $srv->recv( $msg, $LEN ) ) { next unless length $msg; $n++; if( length( $buffer .= chr(0) . $msg ) > $MAXBUF ) { $Q->enqueue( $buffer ); $buffer = ''; } }

    Producer:

    #! perl -slw use strict; use IO::Socket; our $N ||= 1000; our $PORT ||= 9999; our $DELAY ||= 0.0001; my $sock = IO::Socket::INET->new( Proto => 'udp', PeerPort => $PORT, PeerAddr => 'localhost' ) or die "$@ [$^E]"; my $sent = 0; for ( 1 .. $N ) { ++$sent; $sock->send( int rand( 32767 ) ); select undef,undef, undef, $DELAY; } print "Sent: $sent";

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I managed sustained throughput rates of close to 600k/minute on a single core 2.6 MHz machine
      Um, that's like one update every 260 clock cycles - truly impressive!
        one update every 260 clock cycles - truly impressive!

        And that would have to include an average of at least 3 context switches in that brief time. I wish :)

        Units corrected. Thanks.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      Neat, but beware UDP is a lossy protocol and your code doesn't appear to do anything to detect dropped packets!

      -sam

        UDP is a lossy protocol

        Yes. That is the nature of udp. But for some applications, throughput and low latency are more important than absolute reliability, which is why udp exists.

        For comms within the same machine where the "transmission" consists entirely of transfers between memory buffers, the scope for non-delivery is relatively low. Even on a local subnetwork with modern high-speed circuits, non-delivery is pretty unheard of unless the subring is running close to its maximum bandwidth. The main source for dropped packets is at the listener if it doesn't service them in a timely manner, which was the purpose of using two threads and buffering to ensure the recv loop could run as tightly as possible.

        For the purposes of my testing, the mechanism for "detecting dropped packets" consisted of printing out how many were sent and how many were received. Good enough for a quick test. I had to make my machine work very hard indeed before it would drop any packets at all. By kicking out my firewall for a test I got close to a 1e6/minute using 5 producers before it started dropping packets.

        However, whether the OPs statistics gathering requires 100% guarantee of accuracy, or just an ongoing indication of current trends where random dropouts are likely to affect all statistcs equally and so not affect their legitimacy--think random sampling--only he will know for sure.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Multiple write locking for BerkeleyDB
by jfroebe (Parson) on Apr 23, 2008 at 18:55 UTC
      SQLite is slower than MySQL, especially when you have multiple writers.