Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Multiple write locking for BerkeleyDB

by BrowserUk (Patriarch)
on Apr 24, 2008 at 06:05 UTC ( [id://682561]=note: print w/replies, xml ) Need Help??


in reply to Multiple write locking for BerkeleyDB

(*)Updated units per pc88mxers post below.

By serialising the counts through a threaded udp server, batching them on input before transfering them to the background writing thread, I managed sustained throughput rates of close to 600k/minute on a single core 2.6 M*GHz machine running both producer and consumer, and with half the bandwidth taken up with a download going on simultaneously:

Consumer run:

c:\test>s-udp.pl -MAXBUF=500 Total: 347176 throughput: 9777/sec [max: 9987]

MAXBUF=500 seems to be about the sweet spot on my machine. I can get higher throughputs by upping the priority of the server process. I was just writing the counts out to a file in the background, but you could probably do batch updates to a db without slowing things much.

Consumer:

use strict; use threads; use threads::shared; use Thread::Queue; use Time::HiRes qw[ time usleep ]; use IO::Socket; use List::Util qw[ max ]; $|++; our $PORT ||= 9999; our $FILE ||= 'theLog'; our $LEN ||= 6; our $MAXBUF ||= 1000; my $Q = new Thread::Queue; open my $log, '+> :raw', $FILE or die "$FILE: $!"; my $start = time; my $n :shared = 0; my %log; my $total = 0; async { my $max = 0; my $thru = 0; while( my $in = $Q->dequeue ) { ++$log{ $_ } for split chr(0), $in; sysseek $log, 0, 0; my $logRec = join "\n", map{ $_ . ':' . $log{ $_ } } sort keys + %log; syswrite( $log, $logRec ); $max = max( $max, $thru = int( $n / ( time() - $start ) ) ); printf "\r\t Total: %8d throughput: %5d/sec [max:%5d]", $total, $thru, $max; $total += $n; $n = 0; $start = time(); sleep 1; } }; my $srv = IO::Socket::INET->new( LocalPort => $PORT, Proto => 'udp', ) or die "Socket: $@ : [$^E]"; my $buffer = ''; my $msg; while( $srv->recv( $msg, $LEN ) ) { next unless length $msg; $n++; if( length( $buffer .= chr(0) . $msg ) > $MAXBUF ) { $Q->enqueue( $buffer ); $buffer = ''; } }

Producer:

#! perl -slw use strict; use IO::Socket; our $N ||= 1000; our $PORT ||= 9999; our $DELAY ||= 0.0001; my $sock = IO::Socket::INET->new( Proto => 'udp', PeerPort => $PORT, PeerAddr => 'localhost' ) or die "$@ [$^E]"; my $sent = 0; for ( 1 .. $N ) { ++$sent; $sock->send( int rand( 32767 ) ); select undef,undef, undef, $DELAY; } print "Sent: $sent";

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Multiple write locking for BerkeleyDB
by pc88mxer (Vicar) on Apr 24, 2008 at 16:06 UTC
    I managed sustained throughput rates of close to 600k/minute on a single core 2.6 MHz machine
    Um, that's like one update every 260 clock cycles - truly impressive!
      one update every 260 clock cycles - truly impressive!

      And that would have to include an average of at least 3 context switches in that brief time. I wish :)

      Units corrected. Thanks.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re^2: Multiple write locking for BerkeleyDB
by samtregar (Abbot) on Apr 24, 2008 at 16:35 UTC
    Neat, but beware UDP is a lossy protocol and your code doesn't appear to do anything to detect dropped packets!

    -sam

      UDP is a lossy protocol

      Yes. That is the nature of udp. But for some applications, throughput and low latency are more important than absolute reliability, which is why udp exists.

      For comms within the same machine where the "transmission" consists entirely of transfers between memory buffers, the scope for non-delivery is relatively low. Even on a local subnetwork with modern high-speed circuits, non-delivery is pretty unheard of unless the subring is running close to its maximum bandwidth. The main source for dropped packets is at the listener if it doesn't service them in a timely manner, which was the purpose of using two threads and buffering to ensure the recv loop could run as tightly as possible.

      For the purposes of my testing, the mechanism for "detecting dropped packets" consisted of printing out how many were sent and how many were received. Good enough for a quick test. I had to make my machine work very hard indeed before it would drop any packets at all. By kicking out my firewall for a test I got close to a 1e6/minute using 5 producers before it started dropping packets.

      However, whether the OPs statistics gathering requires 100% guarantee of accuracy, or just an ongoing indication of current trends where random dropouts are likely to affect all statistcs equally and so not affect their legitimacy--think random sampling--only he will know for sure.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://682561]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2024-04-25 23:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found