No child processes - system limit?

clinton has asked for the wisdom of the Perl Monks concerning the following question:

Hi all

When running multiple HTTP requests via LWP, I occasionally get this error: select failed: no child processes. If I catch the error, and wait a little, I can repeat the request successfully.

The code from LWP::Protocol::http which throws this error is this:

    269        SELECT:
    270             {
    271                 my $nfound = select($rbits, $wbits, undef, $se
+l_timeout);
    272                 if ($nfound < 0) {
    273                     if ($!{EINTR} || $!{EAGAIN}) {
    274                         if ($time_before) {
    275                             $sel_timeout = $sel_timeout_before
+ - (time - $time_before);
    276                             $sel_timeout = 0 if $sel_timeout <
+ 0;
    277                         }
    278                         redo SELECT;
    279                     }
    280                     die "select failed: $!";
    281                 }
    282         }
[download]

It seems I am running into some system limit, but I can't figure out which. ulimit -a on this (linux) system outputs this:

    core file size          (blocks, -c) 0
    data seg size           (kbytes, -d) unlimited
    scheduling priority             (-e) 0
    file size               (blocks, -f) unlimited
    pending signals                 (-i) 27968
    max locked memory       (kbytes, -l) 32
    max memory size         (kbytes, -m) unlimited
    open files                      (-n) 16384
    pipe size            (512 bytes, -p) 8
    POSIX message queues     (bytes, -q) 819200
    real-time priority              (-r) 0
    stack size              (kbytes, -s) 10240
    cpu time               (seconds, -t) unlimited
    max user processes              (-u) 27968
    virtual memory          (kbytes, -v) unlimited
    file locks                      (-x) unlimited
[download]

And the limits as reported by BSD::Resource are as follows:

    RLIMIT_CPU :        -1
    RLIMIT_OPEN_MAX :   16384
    RLIMIT_LOCKS :      -1
    RLIMIT_VMEM :       -1
    RLIMIT_FSIZE :      -1
    RLIMIT_STACK :      10485760
    RLIMIT_MEMLOCK :    32768
    RLIMIT_NOFILE :     16384
    RLIMIT_DATA :       -1
    RLIMIT_NPROC :      27968
    RLIMIT_OFILE :      16384
    RLIMIT_AS :         -1
    RLIMIT_CORE :       0
    RLIMIT_RSS :        -1
[download]

Any ideas what I can change to get around this?

thanks

Clint

Comment on No child processes - system limit? Select or Download Code

Replies are listed 'Best First'.
Re: No child processes - system limit? by almut (Canon) on Apr 01, 2010 at 13:57 UTC
Normally, you'd get this error (ECHILD) if you `wait` for a child, but there is no child, e.g. `$ perl -e 'die $! if wait == -1' No child processes at -e line 1.` [download] In other words, I'm not sure if this is (directly) related to some resource limit at all... (though, of course, it might be a follow-up error of some code doing a `wait` for a child that never had been created, due to a resource limit like memory, or max children per user).	[reply] [d/l] [select]
Re^2: No child processes - system limit? by clinton (Priest) on Apr 01, 2010 at 14:15 UTC
Well, the reason I'm thinking resource limit is that this only occurs when busy, and then a couple of seconds later it works fine again. The docs for select indicate that this is the `select(2)` system call, but the docs for that say the following: ...On error, -1 is returned, and errno is set appropriately;... and lists the following errors: EBADF An invalid file descriptor was given in one of the sets. (Perhaps a file descriptor that was already closed, or one on which an error has occurred.) EINTR A signal was caught; see signal(7). EINVAL nfds is negative or the value contained within timeout is invalid. ENOMEM unable to allocate memory for internal tables. ... none of which correspond to the `no child processes`, and leaving me at a bit of a loss	[reply] [d/l] [select]
Re^3: No child processes - system limit? by almut (Canon) on Apr 01, 2010 at 14:26 UTC
What do you get for `$ getconf CHILD_MAX` [download] (or `getconf -a`, just in case...)	[reply] [d/l] [select]
Re^4: No child processes - system limit? by clinton (Priest) on Apr 01, 2010 at 14:39 UTC
Re^5: No child processes - system limit? by almut (Canon) on Apr 01, 2010 at 15:04 UTC
Some notes below your chosen depth have not been shown here
Re: No child processes - system limit? by ikegami (Patriarch) on Apr 01, 2010 at 16:52 UTC
Do you have any signal handlers? Are you using `fork`, `system`, threads or some means of parallelising?	[reply] [d/l] [select]
Re^2: No child processes - system limit? by clinton (Priest) on Apr 01, 2010 at 17:01 UTC
Yes - in the parent process, I'm reading 5000 records from a source, then forking off a child to reindex each of those 5000 records. The parent forks `$max_kids` processes, recording the PIDs in a hash, then waits until there are fewer than `$max_kids` active. My reaper looks like this: `#=================================== sub _REAPER { #=================================== my $params = shift; foreach my $pid ( keys %Children ) { my $res = waitpid( $pid, WNOHANG ); if ( $res > 0 ) { $Children{$pid} = 0; die "Error in child" if $?; } } $SIG{'CHLD'} = \&_REAPER; }` [download] Note, in the reaper, I set `$Children{$pid} = 0` instead of deleting the key, as that was causing `panic: freed scalar` errors. I now clean up the `%Children` hash in the main loop of the parent. The error I'm seeing is at the stage in the parent when I'm reading the 5,000 records from the source thanks Clint	[reply] [d/l] [select]
Re^2: No child processes - system limit? by clinton (Priest) on Apr 01, 2010 at 17:45 UTC
At the suggestion of moritz, I ran the script with `strace`, the relevant bits of which are as follows: Read more... (4 kB) Here is where the parent child makes the request: Read more... (6 kB) At this stage, my code catches the `select failed: no child processes` error in an `eval`, issues a warning, then sleeps before retrying: Read more... (669 Bytes) I'm not sure what most of this means, but is the value of `$!` being set to "no child processes" by one of my `waitpid` calls, which is interfering with the code in LWP::Protocol::http? Would it help if I `local`ised `$!` in my reaper sub?	[reply] [d/l] [select]
Re^3: No child processes - system limit? by ikegami (Patriarch) on Apr 01, 2010 at 18:09 UTC
Would it help if I localised $! in my reaper sub? I believe so. That's exactly where I was going with my question.	[reply]
Re^3: No child processes - system limit? by almut (Canon) on Apr 01, 2010 at 19:15 UTC
`select(8, [3], NULL, NULL, {172, 0}) = ? ERESTARTNOHAND (To be rest +arted) --- SIGCHLD (Child exited) @ 0 (0) --- sigreturn() = ? (mask now []) rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0 waitpid(14232, 0xbfb45be8, WNOHANG) = 0 waitpid(14233, 0xbfb45be8, WNOHANG) = 0 waitpid(14225, 0xbfb45be8, WNOHANG) = -1 ECHILD (No child processe +s) ...` [download] My interpretation of this would be (as you already figured) that `$!` is being modified in the signal handler before the interrupted `select` call gets a chance to be restarted, i.e. the `redo SELECT` doesn't execute because of that very modification of `$!`. (Note that because of Perl's deferred (aka safe) signal handling, the `sigreturn()` (which is being called at the end of the "real" system/C-level signal handler) happens immediately, before the Perl signal handler runs all the `waitpid` calls. Still, they do run before the next Perl opcode executes (which means this is presumably before `if ($!{EINTR} \|\| $!{EAGAIN})` ). What I find a little surprising is that the ECHILD does occur at all, because your `$Children{$pid}` should've been set to zero in the previous call to the signal handler `waitpid(14225, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 142 +25` [download] where the `waitpid` did return 14225 (i.e. `$res > 0`). In other words, you shouldn't be calling `waitpid(14225,...)` again thereafter, because the 14225 is no longer supposed to be in the hash... (update: err wait, this is nonsense of course, as you're iterating over the keys, not the values. OTOH, this brings up the question what would happen if you did set the values to the PIDs, too, and then iterate over the values instead (as you seem be to getting that panic when deleting the keys...) Maybe you could try to figure out why this is — in addition to trying to localize `$!` as a workaround, of course.	[reply] [d/l] [select]
Re^4: No child processes - system limit? by clinton (Priest) on Apr 01, 2010 at 19:44 UTC

Back to Seekers of Perl Wisdom