Re: What's the best way to fetch data from multiple sources asynchronously?

Replies are listed 'Best First'.
Re^2: What's the best way to fetch data from multiple sources asynchronously? by Errto (Vicar) on Jan 02, 2007 at 20:02 UTC
The problem is that a DBI database/statement handle is not a pipe or socket, so you can't simply call `select()` on it. The DBI does not specify a method of executing statements asynchronously, though perhaps drivers might support it. It looks like there is some way of doing it with POE that uses forked processes behind the scenes.	[reply] [d/l]
Re^2: What's the best way to fetch data from multiple sources asynchronously? by rodion (Chaplain) on Jan 06, 2007 at 03:14 UTC
Quite right. For databases you have to spin off a separate process to deal with the database, which communicates with the selecting process through a socket. Thus all the things you are trying to coordinate become files or sockets. I definitely should have made this more explicit. I over-read the OPs statement that My first thought was forking child processes. Thanks for catching it.	[reply]
Re^3: What's the best way to fetch data from multiple sources asynchronously? by BrowserUk (Patriarch) on Jan 06, 2007 at 06:05 UTC
You would need not just one database process, but one process per database server, and round-robin your select loop over all of them. The real problem comes if the queries return largish volumes of data, then you have to sqeeze it all through those 8-bit pipes. Of course this is normal when you communicate with a dbserver via a socket. But in this scenario, the db processes are perl scripts using DBI, which means that the data received by those processes has already be de-streamed and structured (fetchall_hash\|array etc.), which is a relatively expensive process. But now you need to flatten (serialise) that structure to pass it through the socket back to the parent process where it has then to be restructured again. With all the duplication of parsing and memory allocation that involves. So yes, you could definitely do this using a select loop, Storable `freeze & thaw`, and one socket & DBI process per DB server, but it ain't gonna be quick, or memory effiecient. If the required poll rate x the number of DBServers is more than 1 every ~3 or 5 seconds, you ain't gonna keep up. And if the data volumes are anything more than trivial, your gonna need a machine with a substantial amount of memory. Each byte of data queried (plus all the Perl datatstructure overhead), will concurrently exist in at least 4, probably 5 places at some point in time: The inbound, Perl-structured version in the DBI process; the outbound, Storable-structured version in the DBI process; the inbound Storable structured version in the select-loop process; the Perl re-structured version in the select-loop process; and whatever form the final application requirements need it to be in. Actually there would probably be a sixth (partial?) copy in the DBI library buffers also. And remember, you cannot take advantage of COW for any of this. With threads, you'd have at most 3 copies; no (additional) communications latency; no double deserialisation, reserialisation or restructuring. On top of that, there would be no need to break the applications processing up into a bunch of iddy biddy chunks so as to ensure than your select loop wasn't starved. And threads would be more easily scaled. If later you need to monitor another 10 DB servers, you simply spawn another 10 threads (the processing would be identical). With the multi-process and pipes method, you'd probably have to go back and repartition the application processing code, because you'd need to service the select loop with greater frequency. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]


Clear questions and runnable code get the best and fastest answer
	PerlMonks