Re^3: What's the best way to fetch data from multiple sources asynchronously?

You would need not just one database process, but one process per database server, and round-robin your select loop over all of them.

The real problem comes if the queries return largish volumes of data, then you have to sqeeze it all through those 8-bit pipes. Of course this is normal when you communicate with a dbserver via a socket. But in this scenario, the db processes are perl scripts using DBI, which means that the data received by those processes has already be de-streamed and structured (fetchall_hash|array etc.), which is a relatively expensive process. But now you need to flatten (serialise) that structure to pass it through the socket back to the parent process where it has then to be restructured again. With all the duplication of parsing and memory allocation that involves.

So yes, you could definitely do this using a select loop, Storable freeze & thaw, and one socket & DBI process per DB server, but it ain't gonna be quick, or memory effiecient. If the required poll rate x the number of DBServers is more than 1 every ~3 or 5 seconds, you ain't gonna keep up.

And if the data volumes are anything more than trivial, your gonna need a machine with a substantial amount of memory. Each byte of data queried (plus all the Perl datatstructure overhead), will concurrently exist in at least 4, probably 5 places at some point in time: The inbound, Perl-structured version in the DBI process; the outbound, Storable-structured version in the DBI process; the inbound Storable structured version in the select-loop process; the Perl re-structured version in the select-loop process; and whatever form the final application requirements need it to be in. Actually there would probably be a sixth (partial?) copy in the DBI library buffers also. And remember, you cannot take advantage of COW for any of this.

With threads, you'd have at most 3 copies; no (additional) communications latency; no double deserialisation, reserialisation or restructuring. On top of that, there would be no need to break the applications processing up into a bunch of iddy biddy chunks so as to ensure than your select loop wasn't starved.

And threads would be more easily scaled. If later you need to monitor another 10 DB servers, you simply spawn another 10 threads (the processing would be identical). With the multi-process and pipes method, you'd probably have to go back and repartition the application processing code, because you'd need to service the select loop with greater frequency.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Comment on Re^3: What's the best way to fetch data from multiple sources asynchronously? Download Code


Clear questions and runnable code get the best and fastest answer
	PerlMonks