“It's not Free Lunch” ... Abusing Virtual Memory

When faced with an input-file of a few million records, the application was falling to its knees. But this was a very expansive computer, plenty of memory, fast CPU. What gives?

It turned out that the programmer was reading a DB-file, putting all of the keys into a memory-array, sorting it, then using the array values to walk through the DB. Intrepid programmers were dutifully trying to figure out why “Berkeley DB was running so slow...” But that was not the problem.

The problem was “trying to put several million record-IDs into memory!”

Remember this:   in a typical virtual-memory based environment, “memory”-access does not come without a cost. “Memory,” in fact, is actually a disk file. If you load several million records into it, you are going to incur a very large number of “page faults,” and even if those page-faults do not result in any physical I/O activity your application has still become a great, big, fat, ungainly elephant. The computer may put up with you and do the best that it can, but it will never appreciate you.

The page-fault handler, in any operating-system, is also a heavily serialized piece of internal code:   it's an internal bottleneck choke-point, and it forces CPUs to flush their caches and pipelines, and so-on. In other words, it's not a place that you want to visit tens-of-thousands of times within a short interval. (Think “five o'clock at a toll booth... when only one lane's open.” )

No matter how fast the CPUs of your computer may be, it is still true that the single most-significant determinant of application speed is ... one way or the other, obvious or not ... I/O. Your program can punish the system, not only by incurring large amounts of I/O itself, but also by “putting the squeeze on the buffer-pool.” The latter is what happened in this case.

By loading several million records into an array, the virtual-memory footprint of this application blossomed to about 44 megabytes, which caused about 75,000 page faults to occur just in loading and sorting that “memory” list. Although there was enough RAM to allow that to happen without physical I/O, Linux discarded a corresponding amount of file-buffer space ... and Linux depends a great deal on its buffers for good performance. This application was simultaneously putting a severe strain on both of these uses for memory.

The application was re-coded to execute the Unix sort command and read its output-stream as the input. Knowing that the keys would now be coming in sorted order .. and thus, also, adjacent ... allowed the data to be processed much more efficiently in the DB. The entire reason for building that enormous hash was eliminated completely. The “footprint” was now less than a megabyte. The runtime of this application dropped from many hours to a couple of minutes, and I'm sure that it could be made even faster if anyone so desired.

Computer programs, these days, ought to be fast. We have “an embarrassment of riches” when it comes to hardware and speed. So, if your program isn't performing as fast as you think it should, take a close look at your algorithm. Look for hidden sources of inefficiency ... and for inefficiency, look for I/O.

Footnote:   This article is intended to refer to normal business-oriented computing, which is inherently I/O-intensive, not to the CPU-intensive world of scientific computing.