Re^2: About text file parsing

That's cool, tybalt89. Each day, learn something new about Perl.

I ran serially and parallel with "text.txt" containing 50 million lines. There is no slowness using Perl v5.20 and higher.

Serial

use strict;
use warnings;

open my $input_fh,  '<', 'test.txt'   or die "open error: $!";
open my $sample_fh, '>', 'sample.txt' or die "open error: $!";
open my $good_fh,   '>', 'good.txt'   or die "open error: $!";

# tybalt89's technique running serially
# see https://www.perlmonks.org/?node_id=1221387

local $/ = \2e6; # or bigger chunk depending on your memory size

while (<$input_fh>) { # read big chunk
    $_ .= do { local $/ = "\n"; <$input_fh> // ''}; # read any partial
+ line

    print $sample_fh join("\n", /^sample\s+(\S+)/gm), "\n";
    print $good_fh   join("\n", /^good\s+(\S+)/gm  ), "\n";
}

close $input_fh;
close $sample_fh;
close $good_fh;
[download]

Parallel

use strict;
use warnings;

use MCE;

open my $sample_fh, '>', 'sample.txt' or die "open error: $!";
open my $good_fh,   '>', 'good.txt'   or die "open error: $!";

# tybalt89's technique running parallel
# see https://www.perlmonks.org/?node_id=1221387

MCE->new(
    chunk_size => '1m', max_workers => 4, use_slurpio => 1,
    input_data => 'test.txt',
    user_func  => sub {
        my ( $mce, $slurp_ref, $chunk_id ) = @_;
        local $_ = ${ $slurp_ref };

        MCE->print($sample_fh, join("\n", /^sample\s+(\S+)/gm), "\n");
        MCE->print($good_fh,   join("\n", /^good\s+(\S+)/gm  ), "\n");
    }
)->run;

close $sample_fh;
close $good_fh;
[download]

Demo

$ time /opt/perl-5.26.1/bin/perl demo_serial.pl

real    0m15.662s
user    0m15.025s
sys     0m0.607s

$ time /opt/perl-5.26.1/bin/perl demo_parallel.pl

real    0m4.042s
user    0m15.617s
sys     0m0.345s
[download]

Regards, Mario

Comment on Re^2: About text file parsing Select or Download Code


We don't bite newbies here... much
	PerlMonks