|Keep It Simple, Stupid|
The root problem, which is extremely difficult to deal with, is basically a race-condition: the parent might “determine” the status of a child, but, before it can react to the status that it has thusly determined, the status of the child has changed. Strictly speaking, you don’t even know that your list-of-children is instantaneously correct.
This is not correct. There is no "race-condition" in properly implemented code. The OS handles some things "atomically" (I don't mean automatically - that is different - "atomic" means in a single operation) that you cannot do for yourself. The SIGCHLD like other signals is a level sensitive thing (not edge triggered), meaning that when multiple children exit close to one another, you only get one SIGCHLD signal.
When the SIGCHLD is "delivered" (the handler starts running) the OS atomically blocks that signal. This is different than you setting the sigprocmask yourself in the handler. Basically while you are messing around in your handler, this allows the possibility of an additional SIGCHLD to arrive and be in a "pending" but "undelivered" state.
The classic SIGCHLD handler processes all of the children via the waitpid() function (and there may very well be multiple children to process). If say 5 children exit while you are messing around in the handler. This fact is noted by the OS and this becomes yet another SIGCHLD (a single level triggered signal) in the "pending but undelivered" state.
When you exit the handler, this "pending" SIGCHLD is unblocked and you immediately get another SIGCHLD signal. Basically this ensures that you will not "miss one" - that is the important part that eliminates the "race condition". The OS has to do this and it does.
I think that it is possible under certain circumstances for you to get a SIGCHLD where there is "nothing to do" because its already been handled (while you were just in the signal handler).
Basically, the "race condition" is handled by the OS and there is not a possibility of "missing a SIGCHLD event" as long as you process all available children while you are in the SIGCHLD handler.
use the waitpid() function to reap children. Let the OS do the job of deciding who is ready to "reap" or not. There is no need for the parent to maintain its own "children" list, if that is what you meant.