TCP Server hangs with multiple client connections

rmahin has asked for the wisdom of the Perl Monks concerning the following question:

Long time reader, first time poster. We have just been completely stumped on this problem for a while now so thought the geniuses over here might be able to shed some light on the matter. So here is our problem. We have a server that needs to accept connections from possibly hundreds of clients, all of whom can issue commands with extremely long output. The program hangs when it gets too many connections at a time, or...for some other reason, that we cannot identify.

Ill give you an overview of how our program works, forgive me if some explanation is left unclear I am trying to break it down into code segments that will be useful. Essentially we have a server accepting connections from multiple clients which will be reading and writing from a SQLITE database (we may change to different database if it becomes a problem) and each client issues commands which the server then kicks off to a client thread, which is passed to a worker thread.

Here is the general flow of our code. Will post segments as needed, but my job does not particularly like us sharing lots of code, and we have written several modules.

We have our main server script. It starts by initializing 2 thread pool objects (which add/remove/reuse threads as needed) using the subroutine we specify. For instance:

$ref->{poolObj} = $clientPool;
exit 1 if ($clientPool->startPool(baseThreads => 10, moduleName => "SE
+RVER::Main", subName => "clientThreadPool", values => $ref));
[download]

$ref used to have more values before, so we have left it as is for now.

One threadpool is for client connections and processing commands, and one is for doing the real work so the client can issue more commands while it is running. As the server receives new connections, each client connection is put into a thread queue with 2 values: it's socket as a file descriptor, and its ip address. This is shown in the code below.

# Open Server Socket
my $sock = IO::Socket::INET->new(Proto => 'tcp',
                                    LocalAddr => $servername,
                                    LocalPort => $port,
                                    Listen => SOMAXCONN,
                                    Reuse => 1,
                                    Blocking => 0);

our $selSock = IO::Select->new($sock);

#some more code is here, like initializing our db connection, and log4
+perl logger 

while(my @ready = $selSock->can_read) {
        foreach $_ (@ready) {
        if ($_ == $sock) {
            $logger->info("Base socket is " . $sock);
            $logger->info("Adding new socket");
            #create a new socket
            my $newSock = $sock->accept or die;
            $logger->info("Added new socket " . $newSock);
            $selSock->add($newSock);
        
            my $fd = fileno $newSock;
            my $address = inet_ntoa($newSock->peeraddr);
            
            # enqueue client connection to be processed
            my @values = ($fd, $address);
            if ($clientPool->queueJob(queueValues => \@values)) {
            $logger->error("Failed to queue client connection");
                
                # trash packets received from client
                my $trash = <$newSock>;
                
                send($newSock, "*** ENQUEUE_FAILED ***", 0);
                send($newSock, "*** ZERO_BYTES_LEFT ***", 0);

                $selSock->remove($newSock);
                    
                shutdown($newSock, 2);
                close $newSock;
            }
        }
    }
}
[download]

Each thread in our clientPool then is then dequeuing connections that are enqueued by the server, containing the socket file descriptor, and the ip address of the client for communication and reading input from the client. We then read from this file descriptor like this which works well:

        my $fd = @$work[0];
        our $clientaddr = @$work[1];
        my $selSock = $main::selSock;
        
        # open file handle to read from... if fail to read error
        # error can not be reported back to end user
        # check has been implemented so dev can identify problem by er
+ror msg in server log
        unless (open $fh, '+<&=' . $fd) {
            $logger->error("Unable to open file handle for socket: $!"
+);
            $vars{poolObj}->setDone;
            next;
        }

        # Get the password sent by client
        my $clientpass = <$fh>;
[download]

and check a few more values the clients sends the server. After the checks are done, we are ready to actually process commands.

        # create and add socket file handle so it can be checked withi
+n the while loop
        my $select = IO::Select->new();
        $select->add($fh);
        
        my $last = 1;
        while ($last) {
            my $subroutineCall;
            my $command;
            my $completeLine = "";
            
            # check if anything is ready to be received from client
            my @ready = $select->can_read(.1);
            next unless (@ready);
            
            # loop while receiving from the client untill a full msg h
+as been received as identified by our flags
            while (defined (my $line = <$fh>)) {
                if ($line =~ /\*\*\* CLOSE_CONNECTION_CLIENT \*\*\*\n$
+/) {
                    $logger->info("Client closed connection to the ser
+ver");
                    $last = 0;
                    
                    $selSock->remove($fh);
                    shutdown($fh, 2);
                    close $fh;
                    last;
                }
                        
                $completeLine .= $line;
                if ($completeLine =~ /(.+) \*\*\* COMMAND_INPUT \*\*\*
+ (.+) \*\*\* COMMAND_INPUT_END \*\*\*\n$/is) {
                    $subroutineCall = $1;
                    $command = $2;
                    $logger->info("Processing message \"" . $command .
+ "\"");
                    
                    last;
                }
            }

            if ($last) {
            
                #checking if the subroutine call is defined, if not, i
+t kills that client connection (for cases of closing out of a client 
+window and it sending the empty string)
                if(defined($subroutineCall)){
                $logger->debug("Calling subroutine '${subroutineCall}'
+");
                    
                    my ($string) = executeCommand(subCall => $subrouti
+neCall, command => $command);
                                            
                    send($fh, "$string\n*** ZERO_BYTES_LEFT ***" , 0);
                }else{
                    $logger->info("Undefined subroutine passed, closin
+g client connection.");
                    $last = 0;
                    
                    $selSock->remove($fh);
                    shutdown($fh, 2);
                    close $fh;
                    last;
                }
            }
            
        }
[download]

So this is the main flow of our program. The subroutine executeCommand() uses the modules we have written to interact with our program, and log the output. Each command will be accessing our database getting varrying amounts of information, and updating the table to show jobs in use. For some commands our database must be locked for a good couple minutes or two. We are using the 'begin immediate transaction' to allow other clients to still be able to read from the database. This could be a source of problem, but right now we are thinking it lies elswhere. There two thing we have thought of that COULD be the problem with our code, but all attempts on our own to fix have turned up with very little.

1. log4perl. We write our info messages to the screen, and debug messages to a file. We have LOTS of debug messages. The output of some commands that we run can be well over 50,000 lines of text. We think this could be an issue of multiple threads are attempting to print to the same place at the same time because if we print these messages to the screen, our program hangs extremely quickly, whereas if we keep it to the info messages, things run mostly smoothly. Another sign that this could be a problem, heres a scenario:

we kick off a script to open 80 connections at once, with a client already connected. opening the 80 connections will hang the program if debug messages print to the screen. No new clients can connect to the server. The client already connected however, can issue commands, and have debug messages printed to our log file up to the point of actually receiving the command, but then nothing. nothing is printed to the screen here, only the file. which just seems weird!

We do seem to be running into a memory problem, but we should be able to track that one down, and would difficult for anyone here to do so without complete access to our code. but if you have any ideas on that matter, we'd be happy to hear them.

2. When we execute out commands, we are just doing system calls, but need the output. We originally used backticks to execute our command, but we have now switched to using open() and piping the output to a filehandle. The subroutine also looks for expected output to return to the user and if not found returns the last N lines of the command. that code is seen here, and nearly all of the time when our program hangs, it is around this point.

    # execute the command and save results in a file handle to reduce 
+memory usage
    if (($head =~ /.+/) && ($tail =~ /.+/)) {
        my @lastLines = ();
        my $length;
        open (my $fh, $vars{cmd} . " 2>&1|") or die;
        while (<$fh>) {
            #$logger->info($fh . " " . $_);
            if ($_ =~ /$tail/ims) { #found the last line, terminate lo
+op, do not append the output
                $tailFound = 1;
                last;
            }
            
            if($headFound){ #the header line is already found and igno
+red, so safe to apprend output
                $output .= $_;
            }
            
            if (!$headFound and $_ =~ /$head/ims) { #next lines will b
+e the matched output
                $headFound = 1;
            }
            
            #keeps track of the most recent N lines
            push(@lastLines, $_);
            my $length = @lastLines;
            if($length >= $vars{lastNLines}){
                shift(@lastLines);
            }
            
            #$output .= $_;
        }
        
        unless ($headFound && $tailFound) {
            $output = "\n\nCould not match command output. Printing th
+e last $vars{lastNLines} lines:\n";
            $output .= join("", @lastLines);
        }
        close ($fh);
    } else {
        open (my $fh, $vars{cmd} . " 2>&1|") or die;
        while (<$fh>) {
            #$logger->info($fh . " " . $_);
            $output .= $_;
        }
        close ($fh);
    }
[download]

I really appreciate any help/advice anyone is able to give, so thank you so much in advance for taking the time to look at it!

UPDATE: Thanks everyone for your ideas, in the process of trying them out now. Also forgot to mention that this server is running on windows, using activestate perl. In case that affects any of your suggestions.

UPDATE 8/29/2012: Sorry for the late response everyone. The program took some time to debug and to try various suggestions. We eventually found the problem was log4perl, and for some reason when printing to STDOUT, the program hung. The problem was fixed by printing to STDERR instead. Not sure exactly why this is the case, but it seems to be working now...Thanks again for all your posts!

Comment on TCP Server hangs with multiple client connections Select or Download Code

Replies are listed 'Best First'.
Re: TCP Server hangs with multiple client connections by suaveant (Parson) on Aug 02, 2012 at 18:20 UTC
I've seldom used threads, but here is a suggestion to simplify things. I see no reason to have a central worker thread in the scenario you are presenting. It looks like the only thing you are doing is handling multiple requests individually. Socket accept works in a round robin fashion that works well in a mutli thread/process environment. Create your socket and then create threads/forks, have all of them do a while($sock->accept()) and you are good to go, don't complicate things with queues and threading unless you really need it. Even if you keep the threading for some reason, I'd still suggest handling the socket this way. It'll simplify your code and probably clear up your issues which most likely are caused by some sort of unresolved race condition. I'd look at Net::Server, not only will it hide a lot of code and prevent you from re-inventing the wheel, it even has a mechanism for handling a listen port and log4perl it seems. Might save you a lot of headaches. Edit Net::Server also can do useful things like have a min/max connection pool that grows your processes dynamically - Ant - Some of my best work - (1 2 3)	[reply]
Re^2: TCP Server hangs with multiple client connections by rmahin (Scribe) on Aug 02, 2012 at 19:05 UTC
Thanks for the quick reply! Oh man that is an excellent idea. I wasn't aware that accept() worked like that, will try it and update with results. Will also definitely take a look at that module, Net::Server. We are hesitant to move to completely different way of handling everything as we've invested a lot of time on our current implementation, but if we cant work out this issue may have to...REALLY regretting not having found that module when we first started, at first glance it looks pretty nice.	[reply]
Re^3: TCP Server hangs with multiple client connections by suaveant (Parson) on Aug 02, 2012 at 19:33 UTC
Well, I'd guess you'd be able to pretty much copy the code you use for processing directly into a Net::Server interface without a lot of changes, since you don't really seem to be using the threads for anything but parallel workers. Something along the lines of this, not tested, of course. sub process_request { my $self = shift; my $fh = $self->{server}->{client}; my $select = IO::Select->new(); $select->add($fh); my $last = 1; while ($last) { my $subroutineCall; my $command; my $completeLine = ""; # check if anything is ready to be received from client my @ready = $select->can_read(.1); next unless (@ready); # loop while receiving from the client untill a full msg has b +een received as identified by our flags while ( defined( my $line = <$fh> ) ) { if ( $line =~ /\\\* CLOSE_CONNECTION_CLIENT \\\\n$/ ) + { $logger->info("Client closed connection to the server" +); $last = 0; $selSock->remove($fh); shutdown( $fh, 2 ); close $fh; last; } $completeLine .= $line; if ( $completeLine =~ /(.+) \\\ COMMAND_INPUT \\\* (.+) \\\* COMMA +ND_INPUT_END \\\\n$/is ) { $subroutineCall = $1; $command = $2; $logger->info( "Processing message \"" . $command . "\ +"" ); last; } } if ($last) { #checking if the subroutine call is defined, if not, it kills that cli +ent connection (for cases of closing out of a client window and it se +nding the empty string ) if ( defined($subroutineCall) ) { $logger->debug("Calling subroutine '${subroutineCa +ll}'"); my ($string) = executeCommand( subCall => $subroutineCall, command => $command ); send( $fh, "$string\n ZERO_BYTES_LEFT *", 0 ) +; } else { $logger->info( "Undefined subroutine passed, closing client c +onnection." ); $last = 0; $selSock->remove($fh); shutdown( $fh, 2 ); close $fh; last; } } } } [download] - Ant - Some of my best work - (1 2 3)	[reply] [d/l]
Re: TCP Server hangs with multiple client connections by BrowserUk (Patriarch) on Aug 02, 2012 at 23:46 UTC
You don't mention it explicitly, but if you are using Thread::Pool, that is responsible for half your problems. The last time I tested that module it was badly broken and very, very slow. A quick look now and not much seems to have changed. The other half is that you are trying to use non-blocking sockets on Windows via: `Blocking => 0);` [download] Which unless the powers-that-be have finally gotten around to fixing it, doesn't work on windows. Also, mixing select processing with multi-threading -- in the way you have done it -- is fraught with problems. It can be done (correctly), but is actually rarely needed. It is hard to assess your program given that you supply bits rather than the whole thing; and omit significant details -- like the modules you are using -- but you seem to be doing things in very complicated ways. For reference, here is a simple -- but substantially complete -- pool-based, multi-threaded echo server, that runs very well under extreme load on my Vista machine without skipping a beat: #! perl -slw use strict; use threads; use threads::shared; use Thread::Queue; use IO::Socket; my $semSTDOUT :shared; sub tprint{ lock $semSTDOUT; print @_; } $\|++; my %cache; my $Qwork = new Thread::Queue; my $Qdone = new Thread::Queue; my $done :shared = 0; sub worker { my $tid = threads->tid; while( my $fno = $Qwork->dequeue ) { open my $client, "+<&", $fno or die $!; tprint "$tid: Duped $fno to $client"; my $buffer = ''; while( my $c = sysread( $client, $buffer, 1, length $buffer ) +) { syswrite( $client, $buffer, 1, length( $buffer ) -1 ); while( $buffer =~ s[(^[^\n]+)\n][]g ) { tprint "$tid: got and echoed $1"; } last if $done; } close $client; $Qdone->enqueue( $fno ); tprint "$tid: $client closed"; } } our $W //= 4; my $lsn = new IO::Socket::INET( Listen => 5, LocalPort => '12345' ) or die "Failed to open listening port: $!\n"; my @workers = map threads->create( \&worker, \%cache ), 1 .. $W; $SIG{ INT } = sub { close $lsn; $done = 1; $Qwork->enqueue( (undef) x $W ); }; while( my $client = $lsn->accept ) { my $fno = fileno $client; $cache{ $fno } = $client; $Qwork->enqueue( $fno ); delete $cache{ $Qdone->dequeue } while $Qdone->pending; } tprint "Listener closed"; $_->join for @workers; tprint "Workers done"; [download] If you want some real help with your code -- rather than getting fobbed off with "Use this module I just found on cpan and switch to flavour-du-jour of *nix." -- you'll need to let me see your complete program (here or off-forum). With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l] [select]
Re: TCP Server hangs with multiple client connections by Khen1950fx (Canon) on Aug 02, 2012 at 19:45 UTC
I looked at your server socket. There's a problem with IO::Select's can_read. If there's no timeout given, or if it contains a registered handler, can_read will block. Here's how I would handle the socket: #!/usr/bin/perl select(STDERR); $\| = 1; select(STDOUT); $\| = 1; use autodie; use strictures 1; use IO::Socket; use IO::Select; use Log::Log4perl; my $sock = new IO::Socket::INET( LocalAddr => 'www.example.com', LocalPort => 9000, Listen => 128, Reuse => 1, Blocking => 0, Proto => 'tcp', ); our $selSock = new IO::Select($sock); while( my(@ready) = $selSock->can_read(10) ) { use strict qw/refs/; use warnings FATAL => 'syntax'; foreach $_ (@ready) { if ($_ == $sock) { my $logger; my $clientPool; $logger->info("Base socket is " . $sock); $logger->info("Adding new socket"); my $newSock = $sock->accept or die; $logger->info("Added new socket " . $newSock); $selSock->add($newSock); my $fd = $selSock->fileno($newSock); my $address = inet_ntoa($newSock->peeraddr); my @values = ($fd, $address); if ($clientPool->queueJob( queueValues => \@values)) { $logger->error( "Failed to queue client connection" ); my $trash = <$newSock>; $trash->send($newSock, " ENQUEUE_FAILED ", 0); $trash->send($newSock, " ZERO_BYTES_LEFT ", 0); $selSock->remove($trash); $newSock = undef; sleep 1; exit; } } } } [download]	[reply] [d/l]
Re^2: TCP Server hangs with multiple client connections by rmahin (Scribe) on Aug 02, 2012 at 20:53 UTC
Tried what you suggested, changed it a little though to `while(1) { use strict qw/refs/; use warnings FATAL => 'syntax'; my @ready = $selSock->can_read(10); unless (@ready) {next};` [download] so that it wouldn't die if no clients connected. But anyway, and had no effect that I could see. Thanks for the suggestion though!	[reply] [d/l]
Re: TCP Server hangs with multiple client connections by flexvault (Monsignor) on Aug 02, 2012 at 19:28 UTC
rmahin, ...we write our info messages to the screen, and debug messages to a file. We have LOTS of debug messages. The output of some commands that we run can be well over 50,000 lines of text... Why are you writing any messages to the screen? That's the slowest part of your script. I've had scripts end and the output keep printing on the screen for minutes after the script has completed. With the volume you're talking about your 'hang' may come from buffer overflow in the system libraries. I would change the script to redirecting 'info' to the bit bucket, or the very least to a separate buffered file. Buffering in production is very important, and having 80 processes running, the output on the screen isn't worth much anyway. Another area of concern is using open and pipe the output. I now use a 'flock' with this technique because of 'hang' situations. That may ruin your concurrency. Why not use 'qx/.../;', since I believe Perl will fork and give you the results back. This is a large problem, and many, many things could go wrong, but hopefully PM can help you think of alternatives. Good Luck! "Well done is better than well said." - Benjamin Franklin	[reply]
Re: TCP Server hangs with multiple client connections by sundialsvc4 (Abbot) on Aug 03, 2012 at 01:03 UTC
I definitely prefer BrowserUK’s approach of having just one thread that does the socket-handling, with two queues ... one (or more) for inbound work, and the other for completed. It gives you a very clean flow of control, and a method for accepting inbound work (and for disposing of completed work) that remains consistent no matter how big the loads get. The intake and outtake functions take a consistent amount of resources no matter how many workers there are, and a tunable number of workers ply their business no matter how backlogged the queues may get. (They can also easily shut-down a denial of service attack, because, so to speak, you have one faithful secretary whose job it is to answer the phone, and who also knows how big the intake queue is getting. This one process is in the position to know, with the power to act.) You can calculate the system’s optimal capacity to do work, set the knobs accordingly, and know that the system will churn through the work without fear of thrashing. Definitely the best scalable solution.


Come for the quick hacks, stay for the epiphanies.
	PerlMonks