Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^3: System call doesn't work when there is a large amount of data in a hash

by 1nickt (Canon)
on May 01, 2020 at 01:11 UTC ( [id://11116293]=note: print w/replies, xml ) Need Help??


in reply to Re^2: System call doesn't work when there is a large amount of data in a hash
in thread System call doesn't work when there is a large amount of data in a hash

Hi again,

I'll just suggest once more that you let go of the idea that you must load all your data into an in-memory hash in order for your program to be fast. For one very fast approach please look at mce_map_f in MCE::Map (also by the learned marioroy) which is written especially for optimized parallel processing of huge files.

(As an aside, have you profiled your code? I would think that Perl could load data from anywhere (file, database, whatever) faster than a shell call to an external analytical program would return ... or does your program not expect a response?)

As far as your finding that

"parallelisation of the code after loading the hashes ... turned out slowing down the process or impossible because it would duplicate the hash"
... please see MCE::Shared::Hash.

Hope this helps!


The way forward always starts with a minimal test.

Replies are listed 'Best First'.
Re^4: System call doesn't work when there is a large amount of data in a hash
by Nicolasd (Acolyte) on May 01, 2020 at 11:37 UTC
    Hi,

    I think I tried MCE::Map a few years ago, but will check it to be sure. I tried many methods so that is why I am convinced about the big hash, but I could be wrong of course, as there is much of Perl I don't know.
    But small differences in speed will make a big difference because the script has to access the hash millions of time (I actually build 3 hashes), so some alternatives work fine at first sight, but on large datasets it slows down a lot.
    Similar software (in C++ or python) usually need even more memory than mine (although they use a different graph based method so hard to compare)

    (As an aside, have you profiled your code? I would think that Perl could load data from anywhere (file, database, whatever) faster than a shell call to an external analytical program would return ... or does your program not expect a response?)
    Sorry I don't understand the question, is this about the system call? And I guess I didn't profile the code, as I don't know what that means :)

    I think I tried this one (MCE::Shared::Hash) and it turned out too slow, but again I need to verify this, I will check If find the code, else I will try it.
    Thanks
        I used that before, but I am not a big fan because it doesn't really show which parts consume the most time.
        Often parts that take the most time were not shown in the analysis.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11116293]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2024-03-28 12:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found