Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re: Processing ~1 Trillion records

by mpeppler (Vicar)
on Oct 26, 2012 at 06:43 UTC ( [id://1001007]=note: print w/replies, xml ) Need Help??

in reply to Processing ~1 Trillion records

I've only glanced at the various answers quickly, so maybe I'm off the mark, but:

My immediate reaction to needing to process that many rows is to try to parallelize the process. It will put a higher load on the DB, but that's what the DB is really good at. Obviously your dataset needs to be partitionable, but I can't imagine a dataset of that size that can't be split in some way.


Replies are listed 'Best First'.
Re^2: Processing ~1 Trillion records
by Anonymous Monk on Oct 26, 2012 at 12:47 UTC
    Also you need to be sure that it's able to produce results continuously over all those many days. If the program as-writ dies fifteen minutes before starting to write its first file (all data spewing out of RAM only at that point) the entire length of time is waste. No good.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1001007]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2024-04-21 05:48 GMT
Find Nodes?
    Voting Booth?

    No recent polls found