http://qs321.pair.com?node_id=612784


in reply to Re^4: Perl solution for storage of large number of small files
in thread Perl solution for storage of large number of small files

Often-accessed data will stay in memory whether it is accessed via read() or mmap(). mmap() can be a more convenient interface, precisely because of the opposite effect, data on disk mapped by mmap() *isn't* automatically brought into memory until it is used, and then only the bits which are needed are brought in (subject to 4k page granularity). Whereas a successful read() will always bring the data to memory.

This means you are perhaps less likely to have unwanted data in memory, but that's more to do with it taking more code to do the read() approach well than because mmap()'d data is more likely to stick in memory.

The kernel might trigger different heuristics for the two different methods of access (such as readahead if you do a number of sequential reads or a big sequential memory access to an mmap'd area), but I'm not even sure of that - they might go through exactly the same code paths.

I'd say that the biggest difference is the results of a read() are normally copied into a per-process buffer in the application, whereas multiple processes can in principle share the same copy of mmap'd data.