http://qs321.pair.com?node_id=653081

“It's not Free Lunch” ... Abusing Virtual Memory

When faced with an input-file of a few million records, the application was falling to its knees. But this was a very expansive computer, plenty of memory, fast CPU. What gives?

It turned out that the programmer was reading a DB-file, putting all of the keys into a memory-array, sorting it, then using the array values to walk through the DB. Intrepid programmers were dutifully trying to figure out why “Berkeley DB was running so slow...” But that was not the problem.

The problem was “trying to put several million record-IDs into memory!”

Remember this:   in a typical virtual-memory based environment, “memory”-access does not come without a cost. “Memory,” in fact, is actually a disk file. If you load several million records into it, you are going to incur a very large number of “page faults,” and even if those page-faults do not result in any physical I/O activity your application has still become a great, big, fat, ungainly elephant. The computer may put up with you and do the best that it can, but it will never appreciate you.

The page-fault handler, in any operating-system, is also a heavily serialized piece of internal code:   it's an internal bottleneck choke-point, and it forces CPUs to flush their caches and pipelines, and so-on. In other words, it's not a place that you want to visit tens-of-thousands of times within a short interval. (Think “five o'clock at a toll booth... when only one lane's open.” )

No matter how fast the CPUs of your computer may be, it is still true that the single most-significant determinant of application speed is ... one way or the other, obvious or not ... I/O. Your program can punish the system, not only by incurring large amounts of I/O itself, but also by “putting the squeeze on the buffer-pool.” The latter is what happened in this case.

By loading several million records into an array, the virtual-memory footprint of this application blossomed to about 44 megabytes, which caused about 75,000 page faults to occur just in loading and sorting that “memory” list. Although there was enough RAM to allow that to happen without physical I/O, Linux discarded a corresponding amount of file-buffer space ... and Linux depends a great deal on its buffers for good performance. This application was simultaneously putting a severe strain on both of these uses for memory.

The application was re-coded to execute the Unix sort command and read its output-stream as the input. Knowing that the keys would now be coming in sorted order .. and thus, also, adjacent ... allowed the data to be processed much more efficiently in the DB. The entire reason for building that enormous hash was eliminated completely. The “footprint” was now less than a megabyte. The runtime of this application dropped from many hours to a couple of minutes, and I'm sure that it could be made even faster if anyone so desired.

Computer programs, these days, ought to be fast. We have “an embarrassment of riches” when it comes to hardware and speed. So, if your program isn't performing as fast as you think it should, take a close look at your algorithm. Look for hidden sources of inefficiency ... and for inefficiency, look for I/O.

Footnote:   This article is intended to refer to normal business-oriented computing, which is inherently I/O-intensive, not to the CPU-intensive world of scientific computing.

Replies are listed 'Best First'.
Re: RFC: Abusing "virtual" memory
by tilly (Archbishop) on Nov 27, 2007 at 01:57 UTC
    Another lesson from this case.

    When a machine starts using virtual memory, its performance drops substantially. If you've got a server whose throughput needs to stay high to keep up with demand, this is a disaster. But, worse yet, once things start swapping, it is really hard to recover.

    By contrast if that machine had no virtual memory, the failures are much more obvious. Plus there is a good chance that the offending memory hog will die fast, and the server is likely to be able to continue doing everything else it is supposed to do.

    Therefore if you have an expensive server, think about getting extra RAM and disabling swap. Or even just disabling swap.

      When a machine starts using virtual memory, its performance drops substantially.

      Either the machine is using virtual memory from before it finished booting or it is running in "real mode" and thus probably not running a modern, multi-user operating system. That may sound like nit-picking since you probably meant something more like "using paging space" but even that doesn't make much sense to me (and the rest of what you said makes the distinction important beyond a nit-picking correction). "Making heavy use of paging space" would be more accurate, or just "paging heavily". But, yes, running out of physical memory can cause a system to become extremely inefficient and have a hard time getting much of anything done and even have a hard time recovering. I remember our ancient VMS system had a kernel trap that if it was spending more time futzing around trying to figure out how to get the next thing done than it was spending actually doing things, then it would just give up, flush buffers, and reboot.

      Therefore if you have an expensive server, think about getting extra RAM and disabling swap. Or even just disabling swap.

      I've never seen that option. I guess it makes sense for it to exist given that something like Linux is able to run on tiny systems lacking an appropriate resource to hold paging space.

      By contrast if that machine had no virtual memory,

      I think you mean "had no paging space" (a.k.a. "swap space" but I try not to say "swap" when I mean "paging" and "swap space" is mostly used for paging not swapping entire processes out of memory). The modern multi-user operating system with its protections are based around virtual memory so I doubt you are running without virtual memory, just with virtual memory that is not allowed to grow larger than the size of physical memory.

      the failures are much more obvious. Plus there is a good chance that the offending memory hog will die fast, and the server is likely to be able to continue doing everything else it is supposed to do.

      My experience is that running out of virtual memory means that there will likely be something somewhere other than the "one hog" that runs into a case of malloc() or realloc() failing. And my experience is that it is extremely rare for code to be written to deal well with malloc() or realloc() failing. So if we have a system that has run out of virtual memory, then we schedule it for a reboot ASAP. Often, the corruption caused by the virtual memory exhaustion isn't obvious in the short term so most often the system appears to continue on rather normally. But in the cases when the reboot wasn't done, eventually something about the system became flaky.

      AIX had an interesting take on this problem. It would notice that it was coming close to running out of memory and so would pick a process to kill based on some heuristics that I don't recall ever seeing documented. (It also didn't care how much virtual memory a process allocated, just how much virtual memory the process used, thus malloc() would never fail.) My experience was that AIX's heuristics in this area almost always picked the wrong victim. I don't know if that is an indication of the problem being significantly harder than it might at first appear or if it is just IBM implementing something stupid (likely a combination).

      Sad to say, but it is too easy to write C code that doesn't bother to check whether malloc() returned NULL. So having one or more processes have an internal failure at random, some of them silently, sounds much worse to me than AIX's idea of having one process die obviously and in a relatively controlled manner. And I recall AIX's solution not being well liked.

      So I advise caution to anyone planning on taking your advice.

      Unfortunately, I have not seen a magic bullet for dealing with mis-behaved memory hogs. And my experience says that this approach isn't a magic bullet, either. Our arsenal of weapons against this problem is comprised of such various and diverse elements as testing, load boxing, monitoring, post mortem analysis, isolation, ... The problem gets really hard when your "production systems" are the ones used by a bunch of users and programmers (the best single tool I've seen from the side-lines there is having a daemon that suspends any process that makes a nuisance of itself and notifies the "watchers" for intervention).

      - tye        

Re: RFC: Abusing "virtual" memory
by graff (Chancellor) on Nov 27, 2007 at 04:17 UTC
    Everything you say is true. I would add that sometimes the memory burden can come from unexpected things. I had a recent experience where one programmer set up an ongoing web-harvest process that, over the span of several months, populated two directories on one disk volume with over 2 million files each. I don't actually know how long the file names were (they might have been 24 characters or longer), but each unix directory file itself (not a data file, but the directory file) was over 100 MB.

    One surprising fact I learned from that situation is that the standard unix "find" utility (freebsd version -- don't know about other flavors), when trying to traverse one of those directories, grew to have an enormous memory footprint. And since "find" is typically used in activities like backups, this became a problem, as the "daily" backup for this one disk was taking more than 24 hours to finish... in large part because the backup server was page-faulting.

    (That collection process was retooled to use a different disk volume, which is not being backed up, and the process now includes the creation of a daily tar file on a separate volume that is backed up -- just one new file per day, instead of many thousands.)

Re: RFC: Abusing "virtual" memory
by BrowserUk (Patriarch) on Nov 26, 2007 at 22:09 UTC
    this application blossomed to about 44 megabytes, which caused about 75,000 page faults

    Simple solution. Pre-expand. If it's an array, $#array = nnnn;. If it's a hash, keys %hash = nnnn;.

    It's not a complete solution as it only reserves space for the structure, not for the scalars held by the structure, but there is a solution to that also.

    Allocate and free a larger scalar, my $space = 'x' x 44e6;. This will cause the application to request and commit the total memory requirement from the OS in one big hit. Now that space allocated to the process in one go, is available for reallocation by the process without the hit of any new requests to the OS.

    No transitions form user mode to kernel mode. One page fault.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Page faults are not caused when memory is allocated, but when memory is used. And allocated memory is not "commit"ted; it will be swapped out and back in as the OS sees fit, so re-allocating that space can certainly cause plenty of page faults, even if no "new requests to the OS" are made directly.

      Excessive page faults causing a system to be very slow happens when there isn't enough physical memory for all of the running processes to have the things they need resident at the same time. Playing games with when memory is allocated is unlikely to make a big difference in such a scenario in my experience.

      No transitions form user mode to kernel mode. One page fault.

      That seems to nicely sum up your confusion. Page faults happen as interrupts, not as calls from user mode into kernel mode. You cannot prevent page faults by avoiding calling into the kernel. Just about any time you access any byte of memory, a page fault could happen. Even allocating a big chunk of memory with a single call into the OS will surely cause more than one page fault.

      - tye        

        There are (under win32), two situations in which a page fault can occur.

        1. The first is when an attempt is made to access a reserved and commited page that has been relegated to the system pagefile (or other backing file) due to overall system memory exhaustion.

          This is scenario you have based your post upon.

        2. The second is when an attempt is made to access a page of memory that has been reserved, but not yet commited.

          Win32 memory management schemes frequently use VirtualAlloc( baseAddr, someLargeSize, MEM_RESERVE, PAGE_READWRITE|PAGE_WRITECOPY|PAGE_GUARD ) to reserve a large address space to the process, but without actually allocating physical memory or pagefile space.

          This memory can then be managed using the Heap* functions APIs which will automatically commit previously reserved pages on-demand.

          Once reserved memory has been commited, it is also possible to MEM_RESET that memory. This indicates to the OS that the pages in question are no longer being used and so need not be written to the pagefile when their physical pages are to be reused for other processes, but the virtual pages are not decommited, because they will be reused at a later point.

          A quote from the documentation spells this out more clearly:

          The HeapCreate function creates a private heap object from which the calling process can allocate memory blocks by using the HeapAlloc function. HeapCreate specifies both an initial size and a maximum size for the heap. The initial size determines the number of committed, read/write pages initially allocated for the heap. The maximum size determines the total number of reserved pages. These pages create a contiguous block in the virtual address space of a process into which the heap can grow. Additional pages are automatically committed from this reserved space if requests by HeapAlloc exceed the current size of committed pages, assuming that the physical storage for it is available. Once the pages are committed, they are not decommitted until the process is terminated or until the heap is destroyed by calling the HeapDestroy function.

          So you see, by preallocating a large chunk of memory and then freeing it back to the heap, the pages commited by that large allocation are commited (backed by physical memory and/or pagefile space), but then returned to the heap manager for reallocation. They are therefore already commited (to the process/physical memory/pagefile) but free for reallocation for subsequent calls to HeapAlloc. This means that new calls to HeapAlloc can be satisfied in user mode as there is no need for transition to kernel mode to commit new pages to the process.

        This extract from Win32.c is one example of the manipulations Perl does with MEM_RESERVED and MEM_COMMIT:

        static char *committed = NULL; /* XXX threadead */ static char *base = NULL; /* XXX threadead */ static char *reserved = NULL; /* XXX threadead */ static char *brk = NULL; /* XXX threadead */ static DWORD pagesize = 0; /* XXX threadead */ void * sbrk(ptrdiff_t need) { void *result; if (!pagesize) {SYSTEM_INFO info; GetSystemInfo(&info); /* Pretend page size is larger so we don't perpetually * call the OS to commit just one page ... */ pagesize = info.dwPageSize << 3; } if (brk+need >= reserved) { DWORD size = brk+need-reserved; char *addr; char *prev_committed = NULL; if (committed && reserved && committed < reserved) { /* Commit last of previous chunk cannot span allocations */ addr = (char *) VirtualAlloc(committed,reserved-committed,MEM_COM +MIT,PAGE_READWRITE); if (addr) { /* Remember where we committed from in case we want to decommit +later */ prev_committed = committed; committed = reserved; } } /* Reserve some (more) space * Contiguous blocks give us greater efficiency, so reserve big blo +cks - * this is only address space not memory... * Note this is a little sneaky, 1st call passes NULL as reserved * so lets system choose where we start, subsequent calls pass * the old end address so ask for a contiguous block */ sbrk_reserve: if (size < 64*1024*1024) size = 64*1024*1024; size = ((size + pagesize - 1) / pagesize) * pagesize; addr = (char *) VirtualAlloc(reserved,size,MEM_RESERVE,PAGE_NOACCE +SS); if (addr) { reserved = addr+size; if (!base) base = addr; if (!committed) committed = base; if (!brk) brk = committed; } else if (reserved) { /* The existing block could not be extended far enough, so decom +mit * anything that was just committed above and start anew */ if (prev_committed) { if (!VirtualFree(prev_committed,reserved-prev_committed,MEM_DEC +OMMIT)) return (void *) -1; } reserved = base = committed = brk = NULL; size = need; goto sbrk_reserve; } else { return (void *) -1; } } result = brk; brk += need; if (brk > committed) { DWORD size = ((brk-committed + pagesize -1)/pagesize) * pagesize; char *addr; if (committed+size > reserved) size = reserved-committed; addr = (char *) VirtualAlloc(committed,size,MEM_COMMIT,PAGE_READWRI +TE); if (addr) committed += size; else return (void *) -1; } return result; }

        And here are some references to the Heap* calls from vmem.h:

        #define WALKHEAP() WalkHeap(0) #define WALKHEAPTRACE() WalkHeap(1) * HeapRec - a list of all non-contiguous heap areas const int maxHeaps = 32; /* 64 was overkill */ * Use VirtualAlloc() for blocks bigger than nMaxHeapAllocSize since const int nMaxHeapAllocSize = (1024*512); /* don't allocate anything +larger than this from the heap */ int HeapAdd(void* ptr, size_t size BOOL bRet = (NULL != (m_hHeap = HeapCreate(HEAP_NO_SERIALIZE, ASSERT(HeapValidate(m_hHeap, HEAP_NO_SERIALIZE, NULL)); BOOL bRet = HeapDestroy(m_hHeap); HeapFree(m_hHeap, HEAP_NO_SERIALIZE, m_heaps[index].base); ptr = HeapReAlloc(m_hHeap, HEAP_REALLOC_IN_PLACE_ONLY|HEAP_NO_SERI +ALIZE, HeapAdd(((char*)ptr) + m_heaps[m_nHeaps-1].len, size ptr = HeapAlloc(m_hHeap, HEAP_NO_SERIALIZE, size); if (HeapAdd(ptr, size)) { if (HeapAdd(ptr, size, bBigBlock)) { HeapAdd(ptr, size); int VMem::HeapAdd(void* p, size_t size void VMem::WalkHeap(int complete) MemoryUsageMessage("VMem heaps used %d. Total memory %08x\n", m_nH +eaps, total, 0); ASSERT(HeapValidate(m_hHeap, HEAP_NO_SERIALIZE, ptr));

        So yes. Even accesses to commited memory can cause a pagefault in low memory situations, but when you know you are about to allocate a large number of small pieces of memory--as when filling a large hash or array for the first time--preallocating a large chunk in a single allocation and then freeing it means that space is commited as the result of a single page fault and then returned to the heap for reallocation by the many small alloctions without the need for further page faults.

        What say you now of my "confusion"?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: RFC: Abusing "virtual" memory
by Prof Vince (Friar) on Nov 27, 2007 at 08:46 UTC
    Computer programs, these days, ought to be fast. We have “an embarrassment of riches” when it comes to hardware and speed. So, if your program isn't performing as fast as you think it should, take a close look at your algorithm. Look for hidden sources of inefficiency ... and for inefficiency, look for I/O.
    This opinion is highly biaised by the kind of problems you usually seem to solve. In scientifical computing, most problems don't involve I/O at all and the algorithms are often so straightforward that it'll never be possible to improve them. In those case, it's the sheer power of the CPUs that talks and the microoptimizations (the very same that receive so much hate from the so-called "clean" programmers) that makes you win hours, days or weeks.

      Your point is well-taken and correct. The post was intended to refer to business-oriented situations, not scientific ones. Thank you for bringing up this important point.

Re: RFC: Abusing "virtual" memory
by jbert (Priest) on Nov 27, 2007 at 11:03 UTC
    Of course, 'sort' has to either read everything in before producing any output (since the last line read could sort to the front), or do a bunch of seeking around and re-reading. Even if sort reads everything into mem, you probably still win because the perl scalars were bigger than the lines sort was holding in memory.

    But the real issue is: "Why use a disk-based hash store when you need to process the keys in sorted order?" (Do you need to process them in sorted order?)

    If your keys are sequential, a simple fixed-length record file allows very good performance (you can add new keys to the end, and read a value with a single seek+read).

    If your keys are more complex, I'd bring in an external indexing engine in the form of a db such as SQLite (or mysql, or postgres, or...).

      But the real issue is: "Why use a disk-based hash store when you need to process the keys in sorted order?" (Do you need to process them in sorted order?)

      For that DB_File provides a disk-based hash store with sorted keys - DB_BTREE.

      update: added link

      --shmem

      _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                    /\_¯/(q    /
      ----------------------------  \__(m.====·.(_("always off the crowd"))."·
      ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
        Ah, cool, thanks.

        In which case the question is more simple: "why sort your keys in the app when you can get the db to do it for you?" :-)

      Good sir or madam, I wouldst delicately and most-diplomatically point out that it is, in fact, not so that your intuitive suppositions are correct ... but that, in fact, it is precisely my key-point that, (counter-intuitively though they may-or-may-not be), they are not.

      For at least a generation, our predecessors made-do without computers of any kind... they used only punched cards, and with those primitive tools they accomplished miracles.

      For another full generation beyond that,computers existed, yes, but copious memory-sizes did not. “A million bytes of memory?! Gosh!”

      It is, indeed, precisely(!) my point that our grandfathers with their punched-card machines had, and still have, a valuable lesson to teach us.

        Sorry, I have no idea if your two replies to me in this thread are as a result of my having offended you or not. If you're offended I'm sorry, that wasn't my intent.

        I'm probably being dense, but as far as I could see, neither of your responses answered the question:

        "If you need to process the keys in sorted order - and the volume of data of the keys is significant - why are you storing them in a data structure which does not keep them in a sorted order?"

        Your oblique references to performance perhaps suggest that you think the cost of updating the index on insertion will be too high to meet your throughput needs.

        Is this right?

        If there's some other meaning, I'm afraid I've missed it. Perhaps you could rephrase it?

        Update: is this all a joke, btw? 44Mb of RAM isn't a significant amount out of the buffer cache of "a very expansive computer, plenty of memory, fast CPU". Or is this all on a historical machine? Or is your point something about human versus computer computations?

Re: RFC: Abusing "virtual" memory
by dsheroh (Monsignor) on Nov 27, 2007 at 16:30 UTC
    As I read this, a quote from days long gone keeps popping into my head...

    "Cool - virtual memory! Now I can make a really big ramdisk!"

      Another quote came to my mind, from the beginnings of virtual memory days:
      "If it's there and you can see it - it's real.
      If it's not there and you can see it - it's virtual.
      If it's there and you can't see it - it's transparent.
      If it's not there and you can't see it - you erased it!"
      IBM poster explaining virtual memory (circa 1978)

      ---
      echo S 1 [ Y V U | perl -ane 'print reverse map { $_ = chr(ord($_)-1) } @F;'
      Warning: Any code posted by tuxz0r is untested, unless otherwise stated, and is used at your own risk.

Re: RFC: Abusing "virtual" memory
by shotgunefx (Parson) on Nov 29, 2007 at 22:41 UTC
    Had a similar DB problem a long time ago. Was tokenizing a large number of texts for a search index, adding wasn't going to happen very often so speed wasn't that big of a deal... but, it took something like 38 hours to run. Instead I just punted it off to sort and it came back down to about 30 mins.

    -Lee
    "To be civilized is to deny one's nature."