Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Processing ~1 Trillion records

by mpeppler (Vicar)
on Oct 26, 2012 at 06:43 UTC ( [id://1001007] : note . print w/replies, xml ) Need Help??


in reply to Processing ~1 Trillion records

I've only glanced at the various answers quickly, so maybe I'm off the mark, but:

My immediate reaction to needing to process that many rows is to try to parallelize the process. It will put a higher load on the DB, but that's what the DB is really good at. Obviously your dataset needs to be partitionable, but I can't imagine a dataset of that size that can't be split in some way.

Michael

Replies are listed 'Best First'.
Re^2: Processing ~1 Trillion records
by Anonymous Monk on Oct 26, 2012 at 12:47 UTC
    Also you need to be sure that it's able to produce results continuously over all those many days. If the program as-writ dies fifteen minutes before starting to write its first file (all data spewing out of RAM only at that point) the entire length of time is waste. No good.