Hi,
" I need these large hashes to store genetic data in a hash"
That's a bit like saying " I need these hashes because I need these hashes."
See:
Also see:
Does your "genome assembly tool" accept Perl data hashes as input? Of course it does not. Therefore you must be somehow serializing your massive input to the program in your system call. Perhaps you need to write a file, or provide a data stream to a server? As noted by my learned colleague swampyankee, it's hard to conceive of why you need to store 250Gb of data in an in-memory hash. There are myriad techniques to avoid doing so, depending on your context; why don't you explain a bit more about that, and show some code?
Hope this helps!
The way forward always starts with a minimal test.
| [reply] [Watch: Dir/Any] |
I'm not a bioinformatitician either, but that repo has some problems, filenames using the : character, a single perl file > 1MB with over 23K lines, a quick glance at which shows room for improvement. I'm not sure if part of the relatively popular Bioperl suite of tools can address your requirements. Regardless all of this is good advice. You don't need to store everything in memory even if you are just planning to call some external command line tool. Consider an alternative such as a database.
| [reply] [Watch: Dir/Any] |
I know I could have written it better, it's a bit of a mess, but it works great so that's the most important.
And I really need that hash, because I need to access that data all the time, a database would be too slow.
Which file is using the : character?
Could it be that the system call duplicates everything that is in the virtual memory to start the sister process?
If that is the case I guess I just can't do system calls, any idea if there is another way
| [reply] [Watch: Dir/Any] |
Without seeing your code, it will be very hard to suggest things on how to make it do what you want.
You have discarded all the obvious things that would make it easier, because you say that you really need this.
Ideally, you show us some minimal code that reproduces the problem so that we can run it ourselves. For example, the following could be a start:
#!perl
use strict;
use warnings;
my $memory_eaten = 8 * 1024 * 1024 * 1024; # 8GB, adjust to fit
my %memory_eater = (
foo => scalar( " " x $memory_eaten ),
);
my $cmd = "foo bar";
system($cmd) == 0
or die "Couldn't launch '$cmd': $!/$?";
Updated: Actually make the hash eat memory by creating a loong string | [reply] [Watch: Dir/Any] [d/l] |
"Which file is using the : character?"
Download the repo as a zip file, try to extract under Windows, it'll report a bunch of problems caused by 'invalid' characters in filenames.
| [reply] [Watch: Dir/Any] |
Could it be that the system call duplicates everything that is in the virtual memory to start the sister process?
In theory, fork (used to implement system) does exactly that. Modern kernels with virtual memory will set up COW instead of actually copying the entire address space, but this still (usually) requires duplicating page tables, which for 256GiB of data with 4KiB pages themselves fill 512MiB or so. Could you be bumping up against a resource limit? (Look up ulimit for more information.)
| [reply] [Watch: Dir/Any] [d/l] |