Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: About text file parsing -- MCE

by Discipulus (Canon)
on Aug 29, 2018 at 07:27 UTC ( [id://1221293]=note: print w/replies, xml ) Need Help??


in reply to About text file parsing

Hello dideod.yang,

if your file is huge a line by line processsing will result slow with any variation of the algorithm. But you can throw more CPUs at this with, hopefully, better results. While parallel programming is not so easy to implement correctly in Perl, a gentle monk, marioroy, spent a lot of time and energy to help us, producing MCE and it seems that the second example of the documentation can be easely modified to suit your needs.

The example uses MCE::Loop to work on a file in chunks: pay attention to OS dependant implementation inside the mce_loop_f call below and choose the appropriate one for your OS

# from MCE docs: https://metacpan.org/pod/MCE use MCE::Loop; MCE::Loop::init { max_workers => 8, use_slurpio => 1 }; my $pattern = 'something'; my $hugefile = 'very_huge.file'; my @result = mce_loop_f { my ($mce, $slurp_ref, $chunk_id) = @_; # Quickly determine if a match is found. # Process the slurped chunk only if true. if ($$slurp_ref =~ /$pattern/m) { my @matches; # The following is fast on Unix, but performance degrades # drastically on Windows beyond 4 workers. open my $MEM_FH, '<', $slurp_ref; binmode $MEM_FH, ':raw'; while (<$MEM_FH>) { push @matches, $_ if (/$pattern/); } close $MEM_FH; # Therefore, use the following construction on Windows. while ( $$slurp_ref =~ /([^\n]+\n)/mg ) { my $line = $1; # save $1 to not lose the value push @matches, $line if ($line =~ /$pattern/); } # Gather matched lines. MCE->gather(@matches); } } $hugefile; print join('', @result);

L*

UPDATE you can also be interested in some other tecniques you can find in my library

There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1221293]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-04-25 15:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found