Re: What's the best way to fetch data from multiple sources asynchronously?

My first thought was forking child processes, but (correct me if I'm wrong, as I'm new to this) the child processes can't alter data in the parent process. Then I looked around the web. Other solutions came to mind: shared memory, opening a pipe from the child to the parent so the child can serialize the result and the parent can re-hydrate it, using Storable and having the child write to a file and the parent read from it, etc etc. These all strike me as horrific kludges that may not be safe or portable.

Working with separate processes is certainly more difficult than working with threads, but (real) processes provide some advantages that threads can't provide.

A multiprocess application is much more fault tolerant, since every process runs in its own separate address space, so that if a single child process dies, the whole application is not affected and the child process can simply be restarted. Instead all the threads in a multithreaded application share the same address space, so that a fatal error in a single thread can bring down the whole application.

Another advantage of (real) processes is that they transparently migrate over an SSI cluster (as long as they don't use shared memory to communicate each other), while threads don't (at least with the most common SSI cluster implementations available today).

Also the fork() emulation provided by Perl on Windows works quite well (except in some cases, which are btw avoidable).

There are some CPAN modules that do things like this, but I'm picky: I don't want people to have to install a bunch of arcane modules to run this program (one or two is okay, but not an entire bundle). And it needs to be fast, stable and portable to at least Linux, FreeBSD and Windows. Stop me if I'm asking too much.

You are not asking too much: on the contrary, you are worrying too much (and you are probably approaching the problem the wrong way).

You are providing a complete application, not a module/library, right? So, for the Windows users, what does restrain you from providing them with a complete bundle (including all the necessary modules and the perl interpreter itself) packaged in a Windows installer, so that they don't have to worry about anything?
On the other hand, requiring the average Windows user to first manually install perl, then version X of module A from repository A1 applying patch A2, then version Y of module B from repository B1 applying patch B2 etc., will make him run away from your application.

In the case that the Windows user has already got perl installed he won't probably worry too much to have few megabytes duplicated on his hard-disk, and in the much more common case that he doesn't have perl and/or the necessary modules already installed, he will be more than happy to have a familiar-looking installer which transparently provides everything which is needed to run the application.

If you want to see a working example of a (great) Windows application written in Perl, which packages into a Windows installer all the necessary modules, perl itself and even a bundled web server, have a look at POPFile.
Update: my friend and monk lucas does the same with his popular free groupware application IGSuite.

On *nix, where by the way you'll probably have much less problems, a similar path should be followed anyway: your application should be properly packaged for any single distribution (using a a deb package for debian-based distros, using an rpm package for RH-based distros etc.), so that the dependencies will be handled by the specific package managers (again, mostly transparently for the end-user). If your application becomes popular, this will be handled by various maintainers/packagers, so that you don't even have to worry about that.

You can provide the instructions to manually install everything even on Windows, nothing prevents you from doing that, but provide the installer as well: nothing prevents you from doing that either.

Ciao,
Emanuele.

Comment on Re: What's the best way to fetch data from multiple sources asynchronously?


Perl: the Markov chain saw
	PerlMonks