comment on

You *can* do a single read that's bigger than your available RAM, but that's not what I meant.

If you want to access data in a file thats larger than your available RAM, you'll basically only be working on part of a file at a time, however you go about it. You'll need something to move parts of the file into and out of memory as you go.

One option is to use mmap. Your memory access patterns will then determine which pages the OS faults into your process and which are discarded by the LRU.

You can also use read(). You'll get very similar benefits of caching from the OS, but you'll have to do the "getting data into memory" bit yourself more explicitly.

mmap has it's place and is useful, but I've often come across people who do things like "we'll keep an in-memory cache of recently-used files to avoid having to read them from disk each time", or "we'll use a RAM disk for these files", not realising that if their guess of recently-used is accurate then they don't need to do that, since the OS will make sure the data in those files stays in memory (and if it's inaccurate then they're wasting memory which could be put to better using caching the genuinely frequently used stuff).

In one particular case I saw, the file cache was per-process, so replicated across 60 or so procs on the box, wasting a significant amount of memory (which was a precious resource on the box in question).

So sorry for picking up on this but I just think that many people don't seem to understand that read() can be entirely satisfied from RAM, and will be for a commonly-accessed file (and assuming noatime on the mount point on the box).

Your use of mmap seems perfectly sensible to me, but for reasons of coding simplicity, not because "So memory mapping meant that the often-access data stayed cached in ram". That benefit also applies to read().

In reply to Re^7: Perl solution for storage of large number of small files by jbert
in thread Perl solution for storage of large number of small files by isync

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


No such thing as a small change
	PerlMonks