Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: mmaping a large file

by BrowserUk (Patriarch)
on Aug 23, 2012 at 22:27 UTC ( [id://989400]=note: print w/replies, xml ) Need Help??


in reply to mmaping a large file

ap_file is not supposed to load the whole file in memory, is it?

Yes it does, if you access the entire file. It just does so lazily, on demand rather than all at once when you first 'read' it.

That is to say, when you first map a file, none of its contents are actually loaded from disk. A chunk of your process' virtual address space -- the size of the file -- is reserved and the mapping call returns very quickly. Now, when you attempt to access bits of the file, the 4096-byte page(s) containing the bit you access, will be loaded from disk on-demand (via page fault(s)).

If you have a large dataset in a file: and a) only need access to small bits of it in any given run; b) you can find those bits without reading through the whole file from the beginning; then mapping can be an effective way of minimising the number of pages read from disk.

But, if all you are going to do with the mapped file, is to read it serially from beginning to end, you're better off using normal file IO which doesn't cause page faults, and can read the entire file (serially) through a small amount of memory. (Eg. line by line through one or two page sized buffers.).

Memory mapping also requires that you have sufficient virtual address space in your process in order to hold the amount of the file you need concurrent access to. For 32-bit processes, that means files > 2GB require the programmer to re-map them in order to access the whole file.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Replies are listed 'Best First'.
Re^2: mmaping a large file
by grondilu (Friar) on Aug 24, 2012 at 00:10 UTC
    « If you have a large dataset in a file: and a) only need access to small bits of it in any given run; b) you can find those bits without reading through the whole file from the beginning; then mapping can be an effective way on minimising the number of pages read from disk. »

    yes, that's exactly the use case.

    I've just tried:

    use Sys::Mmap; new Sys::Mmap my $f, 8192, q(bigfile);

    but now I get an error during cleaning:

    (in cleanup) munmap failed! errno 22 Invalid argument

    This is not going to be easy, is it?

      Warning: I don't use *nix, so I cannot test anything I'm about to say.

      The first thing I notice is that the POD for Sys::Mmap uses

      new Mmap my $f, 8192, q(bigfile);

      not

      new Sys::Mmap my $f, 8192, q(bigfile);

      I would have expected you to receive a compile-time error message from that, which suggests you aren't using strict/warnings. It might be a good idea to start.

      The other thought is that mapping a 2GB file through a 8k window is going to involve a lot of shuffling.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://989400]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (2)
As of 2024-04-19 18:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found