http://qs321.pair.com?node_id=1000828


in reply to Processing ~1 Trillion records

A trillion records? A trillion *bytes* is roughly 1TB. Let's assume that your records are, on average, 32 bytes - they're probably bigger, but that doesn't really matter. So you need to read 32TB, process it, and write 32TB. I don't think it's at all unreasonable to take 16 days for 64TB of I/O.

As you've been told, you need to profile your code. You actually need to do that for any performance problem, not just this one.

I, of course, have not profiled your code, and so everything from here on is mere speculation, but I bet that you are I/O bound. You have at least three places where I/O may be limiting you. Reading from the database (especially if you've only got one database server); transmitting data across the network from the database to the machine your perl code is running on; and writing the CSV back out. At least the first two of those can be minimised by partitioning the data and the workload and parallelising everything across multiple machines. You *may* be able to partition the data such that you can have seperate workers producing CSV files too.