http://qs321.pair.com?node_id=653189


in reply to RFC: Abusing "virtual" memory

Of course, 'sort' has to either read everything in before producing any output (since the last line read could sort to the front), or do a bunch of seeking around and re-reading. Even if sort reads everything into mem, you probably still win because the perl scalars were bigger than the lines sort was holding in memory.

But the real issue is: "Why use a disk-based hash store when you need to process the keys in sorted order?" (Do you need to process them in sorted order?)

If your keys are sequential, a simple fixed-length record file allows very good performance (you can add new keys to the end, and read a value with a single seek+read).

If your keys are more complex, I'd bring in an external indexing engine in the form of a db such as SQLite (or mysql, or postgres, or...).