Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: What's the best way to fetch data from multiple sources asynchronously?

by rodion (Chaplain)
on Jan 02, 2007 at 04:47 UTC ( [id://592486]=note: print w/replies, xml ) Need Help??


in reply to What's the best way to fetch data from multiple sources asynchronously?

At work, we've had good results using select(). We've had code using it that has been running at least 5 years. We've been able to add to the code when we wanted, and we've not had any significant problems. The code has been running on 32 and 64-bit Linux systems, and even on our older BSDI boxes, where the OS thread support is broken.
  • Comment on Re: What's the best way to fetch data from multiple sources asynchronously?

Replies are listed 'Best First'.
Re^2: What's the best way to fetch data from multiple sources asynchronously?
by Errto (Vicar) on Jan 02, 2007 at 20:02 UTC
    The problem is that a DBI database/statement handle is not a pipe or socket, so you can't simply call select() on it. The DBI does not specify a method of executing statements asynchronously, though perhaps drivers might support it. It looks like there is some way of doing it with POE that uses forked processes behind the scenes.
Re^2: What's the best way to fetch data from multiple sources asynchronously?
by rodion (Chaplain) on Jan 06, 2007 at 03:14 UTC
    Quite right. For databases you have to spin off a separate process to deal with the database, which communicates with the selecting process through a socket. Thus all the things you are trying to coordinate become files or sockets.

    I definitely should have made this more explicit. I over-read the OPs statement that My first thought was forking child processes. Thanks for catching it.

      You would need not just one database process, but one process per database server, and round-robin your select loop over all of them.

      The real problem comes if the queries return largish volumes of data, then you have to sqeeze it all through those 8-bit pipes. Of course this is normal when you communicate with a dbserver via a socket. But in this scenario, the db processes are perl scripts using DBI, which means that the data received by those processes has already be de-streamed and structured (fetchall_hash|array etc.), which is a relatively expensive process. But now you need to flatten (serialise) that structure to pass it through the socket back to the parent process where it has then to be restructured again. With all the duplication of parsing and memory allocation that involves.

      So yes, you could definitely do this using a select loop, Storable freeze & thaw, and one socket & DBI process per DB server, but it ain't gonna be quick, or memory effiecient. If the required poll rate x the number of DBServers is more than 1 every ~3 or 5 seconds, you ain't gonna keep up.

      And if the data volumes are anything more than trivial, your gonna need a machine with a substantial amount of memory. Each byte of data queried (plus all the Perl datatstructure overhead), will concurrently exist in at least 4, probably 5 places at some point in time: The inbound, Perl-structured version in the DBI process; the outbound, Storable-structured version in the DBI process; the inbound Storable structured version in the select-loop process; the Perl re-structured version in the select-loop process; and whatever form the final application requirements need it to be in. Actually there would probably be a sixth (partial?) copy in the DBI library buffers also. And remember, you cannot take advantage of COW for any of this.

      With threads, you'd have at most 3 copies; no (additional) communications latency; no double deserialisation, reserialisation or restructuring. On top of that, there would be no need to break the applications processing up into a bunch of iddy biddy chunks so as to ensure than your select loop wasn't starved.

      And threads would be more easily scaled. If later you need to monitor another 10 DB servers, you simply spawn another 10 threads (the processing would be identical). With the multi-process and pipes method, you'd probably have to go back and repartition the application processing code, because you'd need to service the select loop with greater frequency.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://592486]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2024-04-23 09:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found