http://qs321.pair.com?node_id=1000845


in reply to Re^2: Processing ~1 Trillion records
in thread Processing ~1 Trillion records

Profile.

Timings I would think would be interesting would be the time taken in the $sth_chr->execute statement, the translate call (is it something that could be done more efficiently in the select call?), the time spent in the if(!defined($x)){$x = 0} statements vs having the DB pre-populate that for you, the time spent in the sorting of the keys, and anything else that shows up in a profiling run.

This (as stated in other posts on this thread) is speculation, but I wonder if the sort could be sped up (if it is a bottleneck) by breaking the keys unsorted out, partitioning the data, sorting each individually, and merging the results. The other possibility, if it is the output that is the bottleneck and not the sort, is that the additional I/O would increase the time. Without profiling data it is impossible to know where to focus.

--MidLifeXis