Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Debugging running processes

by kappa (Chaplain)
on Oct 31, 2008 at 12:15 UTC ( [id://720690]=perlquestion: print w/replies, xml ) Need Help??

kappa has asked for the wisdom of the Perl Monks concerning the following question:

Good day to everybody!

We faced an interesting problem and are currenly looking for ways to debug it.

We have a text-processing daemon (doing some cool analysis on text and emitting context-sensitive ads). It processes several million texts a day and generally does a great job. But several times a day it suddenly starts to bog CPU and to ignore requests. If it were written in C, we would coredump it and debug with gdb or even attach to a running process. Is there a way to know what does a running Perl program doing?

--kap

Replies are listed 'Best First'.
Re: Debugging running processes
by Fletch (Bishop) on Oct 31, 2008 at 12:46 UTC

    You can't after the fact, but you can trigger dropping into the debugger from code if you've started the program under it to begin with.

    • Make a signal handler for, say, SIGUSR1 which sets $DB::single = 1;
    • Start with perl -d mydaemon regular arguments and just hit "c" when you get the debugger prompt
    • When you want the program to drop into the debugger, send it your signal (kill -USR1 pid) and you should get dropped into the debugger inside your signal handler.
    • Profit! Debug!

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: Debugging running processes
by Illuminatus (Curate) on Oct 31, 2008 at 12:47 UTC
    Using any of the cpan profiler modules is useful for finding general inefficiencies, but may not be much help in your case. If it only bogs down very intermittently, then statistical approaches will not help much.

    What OS are you running on? If Linux, I would first try strace (hey, maybe you'll get lucky :)). If you can't predict when the problem will occur, pipe the strace output to a perl script that rolls the file when it gets big, and keeps the last 10 or so around. If you are running on Solaris, you will probably have more success with dtrace, but this is non-trivial as it has its own programming language ('D').

    You could also add lots of debug statements at potential problem spots, that only print when a global is set. You could trigger setting the global on a couple of different things. If you have a main subroutine that is called per-message, you could time its operation. When it gets above a certain threshold, turn on debugging. Or, you could set up a CPU monitor script that sent SIGUSR1 when it got high, and have your program catch it and turn debugging on.
      Looks like the problem is in user space. Most probably a worst-case scenario (combinatorial explosion) of a regexp is triggered. So strace won't help. And this is FreeBSD.
      --kap
Re: Debugging running processes
by BrowserUk (Patriarch) on Oct 31, 2008 at 14:15 UTC

    Try this. Save it as yourperl/site/lib/Devel/Trace/Remote.pm

    Updated: Improved the cleanup to allow reconnects.

    ppackage Devel::Trace::Remote; use strict; use warnings; use IO::Socket; use threads; use threads::shared; use Thread::Queue; my $connected :shared = 0; my $Q = new Thread::Queue; sub DB::DB { return unless $connected; $Q->enqueue( join ' : ', caller ); return; } async { my $server = IO::Socket::INET->new( LocalHost => 'localhost:54321', Listen => 1, Reuse => 1, ) or die $!; $server->autoflush; while( my $client = $server->accept ) { $connected = 1; while( $_ = $Q->dequeue ) { print $client "$_\n\r" or last; }; $connected = 0; $Q->dequeue while $Q->pending; } }->detach; 1;

    Then start your script using perl -d:Trace::Remote yourscript. When things start going awry, telnet into localhost:54321 and you will get output along the lines of:

    ##pkg file lineno main : ack1.pl : 21 main : ack1.pl : 22 main : ack1.pl : 21 main : ack1.pl : 22 main : ack1.pl : 21 main : ack1.pl : 22 main : ack1.pl : 21 main : ack1.pl : 22 main : ack1.pl : 21 main : ack1.pl : 22 main : ack1.pl : 21 main : ack1.pl : 22 main : ack1.pl : 21 main : ack1.pl : 22 main : ack1.pl : 21 main : ack1.pl : 22 main : ack1.pl : 21

    With nothing connected, it will have minimal impact on the process. (For cpu-bound processes about 4 1/2 times slower. Much less for a IO-bound process. When you are connected, expect the slowdown to roughly double.) You may also see some memory growth if the program is is a tight loop (generating output faster than telnet can receive it.

    Many enhancements are possible, but this should allow you to get a feel for what is going on inside the code without to much impact on the normal running. It may end up on CPAN if you find it useful.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Debugging running processes
by ccn (Vicar) on Oct 31, 2008 at 12:25 UTC
    Did you try to use Devel::Profile?

    I mean you can make test version of your script and watch what happens with it.

      Alas, Devel::Profile is too slow to run under in production and the problem is only seen under heavy production load :(
      --kap

        OK. I think it should be possible to test the script on non-production machine. If so, then you can search bottleneck by commenting out different pieces of code until the heavy load cases disappear.

        Pay more attention to regexps that can be bad written like this

        ('ab' x 3000 . 'c') =~ /.*ab.*c/; # fast if can match ('ab' x 3000) =~ /.*ab.*c/; # very very very slow if no match
        ... the problem is only seen under heavy production load

        Have you checked the memory consumption (either on a single instance of the script, or summed over multiple instances that might be running simultaneously on a given server)?

        If any single instance takes up a significant amount of memory, then you can probably work out how many concurrent jobs it would take to swamp available RAM, and cause one or more of the jobs to go into severe swapping / page faulting. The standard unix/gnu "top" command might suffice to spot a problem of that sort as it's happening.

        How hard will it be to reduce the memory footprint of your script? Alternately, how bad will it be to limit the number of simultaneous jobs? The thing about page-fault delays is that the timing impact is non-linear: making one job wait a few sec before it can really start -- to make sure that it is serialized relative to some other job (won't start till some other job finishes) -- can often lead to an overall faster completion than allowing it to run immediately and simultaneously, causing unsustainable competition for available resources,

Re: Debugging running processes
by marto (Cardinal) on Oct 31, 2008 at 13:19 UTC

    In addition to the other good advice already provided, you may want to look at using Devel::NYTProf, an interesting screencast on profiling and a demo of this module in use can be found here.

    Hope this helps

    Martin

Re: Debugging running processes
by renodino (Curate) on Oct 31, 2008 at 15:25 UTC
    You might check out Devel::STrace. It was originally written to address a similar issue with a multithreaded/multiprocess socket server, and surfaced the problem PDQ (some socket misbehavior causing a tight infernal loop).

    Perl Contrarian & SQL fanboy
Re: Debugging running processes
by talexb (Chancellor) on Oct 31, 2008 at 18:51 UTC

    I'm a big fan of Log::Log4perl .. it's logging you can turn on and off (or rather, up or down in sensitivity) externally from the application that's running.

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://720690]
Approved by ccn
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (3)
As of 2024-04-19 05:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found