Re^2: Mmap question

Whomever said that mmap() has nothing to do with SysV shm was absolutely right. Now you're confusing the topic and yourself further by throwing around the word "ramdisk". If your want a ramdisk, create a ramdisk with Linux and mount it (man mount) whereever you like, and then use the normal file I/O operations (open, close, print, readline, and so on). Perl has SysV IPC primitives built-in if your operating system supports them (and in your case, it does). perldoc -f msgctl, perldoc -f msgget, perldoc -f msgrcv, perldoc -f msgsnd. These are all named after the kernel calls of the same name which means you can type man msgctl, man msgget, and so on, to read what Linux's man pages have to say about these functions that Perl gives you access to.

In exactly the same way, mmap() is a function name, and typing man mmap will pull up the documentation on Linux's mmap() that Perl just happens to give you access to by way of the Sys::Mmap module. So now that you know we're really talking about kernel function names, you won't misuse the names - the names mean very specific things (those functions!)/

The SysVIPC (msg* functions) give you access to very small amounts of shared memory - usually only a few kilobytes. By contrast, on a 64 bit system, you could mmap in petabytes of data, and on a 32 bit system, you could mmap in gigs.

Using mmap() to do IPC (inter process communication) is a rotten idea. It's impossible to check for a lock and then lock it if its available in a single operation except using special instructions in the CPU, so without writing XS, you can't do locking operations on data in mmap'd areas. This means that any program that attempts to use mmap'd areas for IPC is going to have race conditions that cause that program to lock up or lose data sooner or later.

On the other hand, SysV shared memory (those msg* functions and system calls) have a built-in semaphore operation to synchronize access to data by using the CPU's special locking primtives. You could combine this with mmap() to coordinate access to large areas of memory between two processes, but this still sucks. This is only necessary if you absolutely can't make a daemon out of your program or you're trying to wire together a Perl program and a program written in another language (such as C) and you don't want to use something like CORBA (ugh). As ugly as distibuted object systems like CORBA are, they're better than mmap+SysVIPC because they were designed for this purpose.

If you're sharing data between two forked Perl programs (created with the fork() function/system call), use Coro instead. Coro lets you share data in plain old Perl variables and it lets you switch back and forth between subroutines in a sort of cooperative multithreading (among many other cool things). Threads would be an option but they're too difficult for a novice to use in Perl and they have some serious rough edges right now. Coro also makes race conditions much easier to avoid and it makes the common cases of multithreading much easier to do.

I used Sys::Mmap to do a multiplayer game of Conway's Game of Life (no, not Damian Conway) as a Perl Mongers presentation on Sys::Mmap. This is a good example use of it. Individual CGI applications modified the state of an image contained in the file. The RAM and disc used to store the file is *exactly* the same RAM that each other instance of the CGI uses and the server daemon users. Every minute or so, the server would do an iteration on the game of life. CGI clients would flip individual bits (in response to clicks on the life board as an image map). In this case, a race condition exists between the time you click something and the client displays the board to you (someone else may have modified the board in the interium) but since the board might change after its displayed to you, this is of little consequence. Using Sys::Mmap to perform read-only operations and take a snap-shot of changing data is also useful. If any more complex data structures than a large bit field needed to be shared, the server would have to be implemented using Coro and HTTP::Server, or with POE, or with threads, and all of the concurrent processes (the server process and connections from each client) would have to be done in the same process. Sys::Mmap is useful where you have raw binary data (no Perl data structures or references) and you want to be able to edit this memory and have it saved to disc as you go. Sys::Mmap is nice for very large files. It's easy to mmap in a multi-gig file on a system with a few hundred megs of RAM. The operation is instant because the data isn't read into memory until that block is actually used. If you tried to slurp up a multi-gig file into a plain Perl scalar on the same system, you'd be waiting a long time. mmap doesn't create copies of the file in memory like reading the file does - instead, it makes the memory an *alias* to the file. These are the places where Sys::Mmap is actually useful. By the way, if you want to see the multiplayer Life game, google for it at site:perldesignpatterns.com (search Google for "site:perldesignpatterns.com conway's game of life multiplayer"). I think that's where I left it.

Sys::Mmap uses file handles (anonymous or real) to share memory. It's also possible to coordinate access to a real (named) file without it, but each application would have to fflush a lot (yet another system call that Perl gives you access to with a function of the same name) and this would create havok with the disc as data would be repeatedly sent to disk only to be read right back in by another process. Honeywell computers in the early 1970's used to make people do this for IPC and it sucked. With mmap, you never have to wonder whether your data is current and you don't have to beat the snot out of the disc. There are a few clever applications for it and a few traditional ones (dynamic libraries are implemented using mmap, and mmap is used by X Windows to get ahold of video display memory), but by a large, you don't want mmap, you want pipes or some form of multiprogramming (Coro, threads, POE, Event, Stem, etc).

So let me summerize: don't use Sys::Mmap as you're obviously trying to do IPC, it doesn't work for IPC, and if you made it to, you'd be reinventing a big, ugly, hairy wheel that's best avoided anyway.

Oh, by the way, if you didn't recognize my name, I'm the Sys::Mmap maintainer, so when I say "don't use Sys::Mmap", you know I'm not baised ;)

-scott

Comment on Re^2: Mmap question

Replies are listed 'Best First'.
Re^3: Mmap question by scrottie (Scribe) on Feb 01, 2005 at 05:30 UTC
By the way, it's an assumption that you're trying to create IPC between two different programs (as opposed to use Sys::Mmap for what it's good for). I know it's annoying to ask for help and have people make assumptions, and the assumptions are sometimes wrong, but they're usually right. In this case, it's extremely common for people to try to figure out how to create shared variables between threads/tasks/processes/programs. Since there is no clean, easy way to do this that suits all situations, there is no readily forthcoming documentation or short FAQ entries. If you're interested in Coro, and I hope you are, the (brand spankin' new!) (shameless plug alert) book Perl 6 Now: The Core Ideas Illustrated with Perl 5 has a few chapters on it and a chapter on threading in Perl. It deals with some of these ideas I just mentioned here - creating a server process that speaks HTTP (or whatever) and handles requests in parallel, and shares data structures between threads, and wiring together user interfaces with networking modules and so on. mmap() isn't introduced, but that's for a reason ;) -scott	[reply]
Re^3: Mmap question by BrowserUk (Patriarch) on Feb 01, 2005 at 07:22 UTC
Nice++. One question, there is nothing in the docs to say this doesn't run on Win32, but since your around(?)--does it? Examine what is said, not who speaks. Silence betokens consent. Love the truth but pardon error.	[reply]
Re^3: Mmap question by zentara (Archbishop) on Feb 01, 2005 at 13:27 UTC
Wow, thanks for the "big picture". The reason I am toying with it, is because in the Advanced Linux Programming Guide, it says that shared memory segments is the fastest way to communicate between 2 processes. So I wanted to know, if Perl could setup "shared memory segments". Of course I confused things, by jumping to the conclusion that mmap meant memory mapping, and that memory mmaping files is part of that. I now see the difference. I would (from my limited knowledge) differ from you on your statement The SysVIPC (msg* functions) give you access to very small amounts of shared memory - usually only a few kilobytes. By contrast, on a 64 bit system, you could mmap in petabytes of data, and on a 32 bit system, you could mmap in gigs. According to my c experiments, each shared memory segment is limited to whats returned from getpagesize(), and on my system it is 4k. But there dosn't seem to be anything stopping one from creating and attaching to "multiple shared segemnts", and increasing it's working size. Of course you are then required to keep track of the segments yourself. Now I notice mozilla uses shared memory segments, as do a few other apps, so it must have some speed benefits over other forms of IPC. Mozilla is using 393kb, of shared memory, on my system. Now that my confusion over mmap vs. shared memory is cleared up, the original question still stands.... Can Perl create or attach to a shared memory segment, as is done in C. If I tried it from Inline::C, would Perl interfere with it's workings? I'm not really a human, but I play one on earth. flash japh	[reply]
Re^4: Mmap question by sgifford (Prior) on Feb 07, 2005 at 04:57 UTC
You don't even have to use C. Perl has built-in functions for most (all?) of the SysV shm operators. See `perlfunc(1)` and search for `System V interprocess communication functions`.	[reply]
Re^5: Mmap question by zentara (Archbishop) on Feb 07, 2005 at 12:31 UTC
Great! Thanks. I'm not really a human, but I play one on earth. flash japh	[reply]
Re^3: Mmap question by sgifford (Prior) on Feb 02, 2005 at 21:02 UTC
Wow, great summary! One comment. You say: Using mmap() to do IPC (inter process communication) is a rotten idea. It's impossible to check for a lock and then lock it if its available in a single operation except using special instructions in the CPU, so without writing XS, you can't do locking operations on data in mmap'd areas. This means that any program that attempts to use mmap'd areas for IPC is going to have race conditions that cause that program to lock up or lose data sooner or later. For file-backed `mmap`, it seems like `fcntl` range-locking would do the trick, although of course it requires a syscall and so would take longer than a CPU instruction. Is there some reason I haven't thought of that this won't work, or is otherwise a horrible idea?	[reply] [d/l] [select]
Re^4: Mmap question (fud) by tye (Sage) on Feb 03, 2005 at 20:21 UTC
mmap is great for IPC. I even worked on a flavor of Unix that had a special shared-memory semaphore (that worked on any shared memory, not just mmap'd shared memory) that required no kernel involvement unless you wanted to wait for a lock to be released. That is, w/o involving the kernel, you can use mmap'd memory to get a lock w/o race conditions and, for the very small percentage of cases (if you've designed your system well) when there is lock contention, you allocate a kernel resource that you can sleep on and mark the mmap lock area so the lock holder will wake you up when they release the lock. I believe this shared-memory locking technique required just 4 d-words of shared memory per lock. You can also use byte-range file locks on even an empty file that doesn't even have to be related to the mmap'd memory. - tye	[reply]


Perl-Sensitive Sunglasses
	PerlMonks