http://qs321.pair.com?node_id=651926


in reply to Re: parallelisation (how to wait on N threads)
in thread Parallelization of heterogenous (runs itself Fortran executables) code

Could you please elaborate on the implications of mixing fork() with Perl threads? ... I'd like to know what the pitfalls are.
The implications stem from how Perl threads are implemented.

Threads.pm clones all non-shared data in the 'parent' thread to the 'child' thread. If you have a lot of data (or code) loaded in the spawning thread, that data is cloned (copied) into the spawned thread.

Users of Unix fork() have come to expect Copy-on-Write semantics; only copying data when it is written to. While that expectation is not necessarily warranted for perl's fork() (perl modifies bit flags on its variables routinely), it certainly doesn't work that way for threads... if you don't understand the behaviour, you'll end up using lots more memory than you intended.

What I don't know (I'm sure BrowserUK does) is whether all the running threads of the (forking) process are then present in the newly spawned (child) process; if they are, it could get messy ;) Tested: only active thread is forked().

If it's best to avoid mixing fork() (from the system() calls) and Perl threads, what would you suggest for Jochen's situation?
I'd recommend he use Parallel::ForkManager to manage worker processes... that's exactly what it was written for.

Also, remember that fork() is emulated using Threads on Win32, so that wouldn't necessarily be the best way to go on that platform.

-David