Re: Bidirectional lookup algorithm? (Updated: further info.)

I’m not sure that I am hearing all of the potential decision factors here. How much memory does this machine have; how much memory can your application be assured of having access to without paging; how many keys will there be? I would, as soon as possible, construct a worst-case test case and run it on a production machine. Random data, realistic volume, and most importantly, realistic request-distributions.

“In-memory” solutions look very attractive on developer machines and at less-than-production loads. But virtual memory really should be looked-upon as a disk resource, as well as a preemptive-priority demand for another limited resource (RAM). Solutions that were “very efficient” on developer machines can become 250,000-kilo elephants with enormous working-set sizes. They apply excessive pressure on other processes, and suffer the most from the pressure that they, in effect, exert most upon themselves. When they fall, they fall hard.

On the other hand, a “file-based” solutions are steady. All disk-I/O is routinely buffered, and operating systems will stuff otherwise-unused memory with caches. When memory pressure appears, they are the first to go, but unless it does appear, they’re nearly as efficient as memory, especially when a file is memory-mapped. (Then, the data is mapped using page/segment tables but it is not treated as high-priority backing store.) When you treat the resource as you would treat a file, e.g. requesting data in chunks, they hold-up even better under stress.

My gut tells me that a traditional database, e.g. SQLite, without specifying a memory-mapped file, will turn out to be the most efficient solution for you ... especially if you can request the data that you need in groups of a few hundred keys at a time. While it might not be “fastest” in the short run on your dev box, “steady, steady wins the race.”

Comment on Re: Bidirectional lookup algorithm? (Updated: further info.)


Come for the quick hacks, stay for the epiphanies.
	PerlMonks