Re: System call doesn't work when there is a large amount of data in a hash

But when there is a lot of virtual memory (250 GB) used by storing it in a hash, the system call doesn't work

This may have to do with the operating system you are using (you said it's Centos 7), and the way it is tuned. You see, the only way to run a program on most Unix-like systems is to fork() a copy of the existing process, then exec() inside the child process to replace the currently running image with a different one. (There is another way involving the vfork() system call, which freezes the parent and makes it undefined behaviour to do almost anything in the child process before the exec(), but almost no-one uses it except some implementations of posix_spawn() standard library function.) Yes, copying entire 250G of the address space of a process just to throw it away on the next system call is wasteful, so when fork() happens, Linux kernel makes the child process refer to the same physical memory that the parent uses, only making a copy when one of the processes tries to change the contents ("copy on write").

This optimisation makes it possible to fork() processes occupying more than 50% of the memory, at the same time introducing a way to break the promise of the allocated process memory: now if both parent and child try to use all of their rightfully allocated (or inherited) address space, the system will run out of physical memory and will have to start swapping. Some people disable this behaviour because they prefer some memory allocation requests (including allocating memory for a fork() of a large process) to fail instead of letting them be paged out or killed by OOM-killer. What is the value of overcommit settings on the machine you are running this code on?

There is a kludge you can use to work around this behaviour: at the beginning of the program, fork() a child process that never does anything besides reading command lines to launch over a pipe from the parent and feeding them to system. This way, the parent stays small enough to have a good chance fork() succeeding, even after the grand-parent grows huge.

Comment on Re: System call doesn't work when there is a large amount of data in a hash Select or Download Code

Replies are listed 'Best First'.
Re^2: System call doesn't work when there is a large amount of data in a hash by Nicolasd (Acolyte) on Apr 29, 2020 at 15:13 UTC
Thanks so much! What you explained seems to be the problem, I just tested a script, this is the answer to a previous comment: Seems to verify what you explained I tried this script and it worked fine on my laptop, put 12 GB of the 16 GB available in hash and system call still works I did got varying results on the Centos 7 (450 GB of RAM), I monitored it also with top, if I saw a memory increase 20 GB, 50 GB, 100 GB, 150 GB and 200 GB all worked fine, didn't see any memory increase either But with 230 GB (more than half of the available) I ran out of memory (Cannot allocate memory), so I need the same amount of memory free than there is in the hash. I also made the system call loop for 10 times and the bigger the hash, the slower the system call starts	[reply]
Re^2: System call doesn't work when there is a large amount of data in a hash by Nicolasd (Acolyte) on Apr 29, 2020 at 15:27 UTC
Now I have to try to figure out your suggestion, not easy to understand as non informatician :) p> overcommit is set to 0 I think, I checked it like this:less /proc/sys/vm/overcommit_memory The problem is that the tool has a lot of users on github, so have to keep in mind that the usage has to be straightforward. So I should look in to vfork() or in the last suggestion you gave? Thanks again for the help, the new tool is to be used for research on genetic disorders in children, so it's for a good cause!	[reply]
Re^3: System call doesn't work when there is a large amount of data in a hash by Anonymous Monk on Apr 29, 2020 at 15:49 UTC
overcommit is set to 0 I think, I checked it like this:less /proc/sys/vm/overcommit_memory Huh, I thought it was 2. I bet your desktop also has 0 there, but for some reason it works there. I am not sure what other settings could influence this behaviour. You could set it to 1 if you have root access and it may even help, but at the cost of potentially summoning OOM-Killer later. So I should look in to vfork() or in the last suggestion you gave? There is POSIX::RT::Spawn that might use `vfork()` under the hood. Try it first. Creating your own child spawn helper is harder, but you could copy the code from Bidirectional Communication with Yourself and start from there. Both options are specific to *nix-like systems and should be avoided `if $^O eq 'MSWin32'` at least.	[reply] [d/l] [select]
Re^4: System call doesn't work when there is a large amount of data in a hash by Nicolasd (Acolyte) on Apr 29, 2020 at 16:19 UTC
Thanks again for your time Yes my desktop also has 0 for overcommit_memory. I do not have root permissions on the Centos and none of the future users will have it either. I will need to run on even larger datasets of human patients, so can't waste any additional memory I will have look at POSIX::RT::Spawn It is weird that there is no easy solution for this, is this also with python or other languages? Because the mother process doesn't need to interact with sister process. I can write the information I need for 'blastn' to a file, then generate a new file that the mother process opens to read the result.< So only the timing when the sister process starts is important, there is no need for a direct interaction.	[reply]
Re^5: System call doesn't work when there is a large amount of data in a hash by Anonymous Monk on Apr 29, 2020 at 16:56 UTC


Clear questions and runnable code get the best and fastest answer
	PerlMonks