Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^2: System call doesn't work when there is a large amount of data in a hash

by Nicolasd (Acolyte)
on Apr 29, 2020 at 12:47 UTC ( [id://11116202]=note: print w/replies, xml ) Need Help??


in reply to Re: System call doesn't work when there is a large amount of data in a hash
in thread System call doesn't work when there is a large amount of data in a hash

The hash and the system call are in the same script, but they are not directly related.

The hash is data from large genomic data files that have to be accessed very fast.

But once the hash is loaded, a system call doesn't work, I need system("blastn ...."), but system("echo Hello") does not work either. It does work when I run it on a small dataset (the hash takes 10 GB of RAM). qx/$command/ doesn't work either

I am testing what is the limit for the hash size to make it work, but I don't understand why a system call doesn't work when I have large hash in memory

  • Comment on Re^2: System call doesn't work when there is a large amount of data in a hash

Replies are listed 'Best First'.
Re^3: System call doesn't work when there is a large amount of data in a hash
by marto (Cardinal) on Apr 29, 2020 at 13:01 UTC

    Searching your repo I get no hit for 'system', or 'qx', despite repeated claims that you take the time to explain what you are doing there seems to be no real question here that anyone can realistically help with, beyond offering the advice you have had to date. "...does not work either..." Tutorials->Debugging and Optimization->Basic debugging checklist.

      This repo works fine, it's for an update that's not online yet. I would like to send code to clarify, but I thought it would distract the attention from the real problem, because i tried many types of system calls, all fail when the hash takes up a lot of memory. That is the only connection I find, I really don't understand the problem, it's the first time I am this stuck, that's why I asked this question.

      I got this error now: 'Cannot allocate memory', Don't know why I didn't saw this before. So I guess a system call needs to copy the complete virtual memory to the sister process.

      So I guess I can not use system calls, I wil try if I have the same problem with Backticks.. If you would know other alternatives, I would always happy to hear it.

      This is the system call I would eventually need:

      system('blastn -query '.$output_file1.' -subject '.$ref_file.' -out BLAST_tmp.txt -outfmt 10')

Re^3: System call doesn't work when there is a large amount of data in a hash
by bliako (Monsignor) on Apr 29, 2020 at 13:41 UTC

    It is very unlikely that blastn takes a GB-size sequence as input from the command line! Most likely what the command expects from you, is to provide the name of the file which holds that huge data.

    So, in all likelihood, you must 1) write your hash to a file if it is indeed in the memory of the Perl script (and not in a file already!) and then 2) make the "system call" and provide the filename to it, as part of the command arguments.

    Make sure that if the expected output is huge, to instruct blastn to write its output to a file. Do not read it back from the output of the command (stdout)! Perhaps use the -o outfile option or simply redirect your command to a file, which is not an elegant solution if you are doing it via Perl's system command.

    The above procedure is acceptable if you create/calculate/transform that hash in the Perl script. Just to make sure: if you just read the hash from file, do not change it in any way and then blastn on it (which implies writing it to a file, as I recommend above), then you are doing something wrong.

    Since you have a lot of RAM available, it is worth investigating either storing it in a RAM-disk which you have to create it first, in fact all your data could go there, including temporary files. OR, use memory-mapped files, perhaps read on File::Map.

    bliako

      Hi, Thanks for the reply

      The blastn is not related to the hash,the hash contains data from sequencing reads, the files for the blastn are very small

      The hash is being used constantly, so that why it needs be a hash, other methods like databases or writing to files, is too slow

      And if change the blastn system call to system("echo Hello"), I get the same problem, so it's not related to the bastn

      That RAM disk is maybe to complicated for users of the tool, mostly biologists who I need to explain what a terminal is :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11116202]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2024-04-19 13:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found