Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

This is a continuation of Re: Problem in RAM usage while threading the program.

Update 1: Added parallel chunking demonstration.
Update 2: Inline::C demonstrations can be found here.

Part Three create the Kyoto Cabinet DB

This is where the fun begins. The # after the file name is how to pass options. See the perldoc for more info. Notice the split line. There is no reason to split the entire sequence header line.

# https://www.perlmonks.org/?node_id=11110379 # usage: perl create_db.pl use strict; use warnings; use KyotoCabinet; unlink 'db.kch'; # construct the database object my $db = KyotoCabinet::DB->new(); # open the database if (! $db->open('db.kch#msiz=128m', $db->OWRITER | $db->OCREATE)) { die "open error (db): ", $db->error; } open my $fh, '<', 'NR.fasta' or die "open error: $!\n"; my $nrid = ''; my $seq = ''; while ( <$fh> ) { chomp; $_ =~ s/\r$//g; if ( /^>/ ) { $db->set($nrid, $seq) if $seq; $nrid = ( split /\s/, $_, 2 )[0]; $seq = ''; } else { $seq .= $_; } } $db->set($nrid, $seq) if $seq; close $fh; $db->close;

Part Four search DB serial code

One might like to measure the difference between running serially and parallel. This makes use of cursor in Kyoto Cabinet. The substr is for stripping the leading comma when displaying the value to STDOUT.

# https://www.perlmonks.org/?node_id=11110379 # usage: perl search_db.pl > Outfile.txt use strict; use warnings; use KyotoCabinet; # construct the database object my $db = KyotoCabinet::DB->new(); # open the database if (! $db->open('db.kch#msiz=128m', $db->OREADER | $db->ONOLOCK)) { die "open error (db): ", $db->error; } my $cur = $db->cursor; open my $fh, '<', 'peptides.txt' or die "open error: $!\n"; while ( my $pep = <$fh> ) { my $ids = ''; chomp $pep; $cur->jump; # first record while ( my ($key, $val) = $cur->get(1) ) { $ids .= ",$key" if index($val, $pep) >= 0; } print "$pep\t", substr($ids, 1), "\n" if $ids; } close $fh; $cur->disable; $db->close;

Part Five search DB parallel code

Here it is. This is the parallel code. It's practically the serial code above wrapped into MCE. Each worker opens a handle to the DB. This is done inside the user_begin block. Likewise, closing is done inside the user_end block. MCE::Relay is helpful for ensuring workers write serially to STDOUT. Thus, no reason to have workers send output to the manager process or store in a shared hash (i.e. MCE::Shared).

# https://www.perlmonks.org/?node_id=11110379 # usage: perl search_db_mce.pl > Outfile.txt use strict; use warnings; use KyotoCabinet; use MCE; # construct the database object my $db = KyotoCabinet::DB->new(); my $cur; my $mce = MCE->new( max_workers => MCE::Util::get_ncpu(), chunk_size => 1, init_relay => 1, user_begin => sub { # open the database if (! $db->open('db.kch#msiz=128m', $db->OREADER | $db->ONOLOC +K)) { die "open error (db): ", $db->error; } $cur = $db->cursor; }, user_end => sub { # close the database $cur->disable; $db->close; }, user_func => sub { my $pep = $_; chomp $pep; my $ids = ''; $cur->jump; # first record while ( my ($key, $val) = $cur->get(1) ) { $ids .= ",$key" if index($val, $pep) >= 0; } # output serially, one worker at a time MCE::relay { print "$pep\t", substr($ids, 1), "\n" if $ids; }; } ); $mce->process('peptides.txt'); $mce->shutdown;

Part Six search DB parallel code - chunking

Chunking helps reduce time by another 25% ~ 30% for this use case. This is beneficial on machines with many cores.

# https://www.perlmonks.org/?node_id=11110379 # usage: perl search_db_chunk.pl > Outfile.txt use strict; use warnings; use KyotoCabinet; use MCE; # construct the database object my $db = KyotoCabinet::DB->new(); my $cur; my $mce = MCE->new( max_workers => MCE::Util::get_ncpu(), chunk_size => 7, init_relay => 1, user_begin => sub { # open the database if (! $db->open('db.kch#msiz=128m', $db->OREADER | $db->ONOLOC +K)) { die "open error (db): ", $db->error; } $cur = $db->cursor; }, user_end => sub { # close the database $cur->disable; $db->close; }, user_func => sub { my ( $mce, $chunk_ref, $chunk_id ) = @_; my %ret; chomp @$chunk_ref; $cur->jump; # first record while ( my ($key, $val) = $cur->get(1) ) { for ( @$chunk_ref ) { $ret{$_} .= ",$key" if index($val, $_) >= 0; } } # output serially, one worker at a time MCE::relay { for ( @$chunk_ref ) { print "$_\t", substr($ret{$_}, 1), "\n" if $ret{$_}; } }; } ); $mce->process('peptides.txt'); $mce->shutdown;

Outfile.txt

GAAGGACTGGGACCA >NR_000001,>NR_006611 AGGCTGCGGCAGGAC >NR_062102 GTGAGCCGGGCAGAG >NR_089584 AGGGGGGGTTGCTGA >NR_036454,>NR_068535,>NR_097889 CTGACATGCGGCGCA >NR_087289 GTGCATCGATGGCCG >NR_005535 GGGGTCAAGCGAACC >NR_076289,>NR_087856 TGAGACGGCGAACCT >NR_064242 AGCGACAAAGGAAAC >NR_045865 AGGTGCAACCATGGA >NR_046602,>NR_056869 GAGTAAACCGCGCGA >NR_093455 AACGACTGAACAGCG >NR_070693 ACGCGTAATTCGATA >NR_080086 GATGAGCGGAGCACT >NR_070118 CGTAGCGAAACCGAG >NR_092384,>NR_098291 GGGGGGGAGGTCCGA >NR_021671,>NR_036907,>NR_080961 AGGGAGGGGGGTTGT >NR_026207 ATGGGGCAGACGCGA >NR_072314

Regards, Mario


In reply to Re^2: Problem in RAM usage while threading the program by marioroy
in thread Problem in RAM usage while threading the program by beherasan

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-03-28 16:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found