Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^4: Rosetta Code: Long List is Long -- Parallel

by marioroy (Prior)
on Dec 09, 2022 at 08:09 UTC ( [id://11148680]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Rosetta Code: Long List is Long
in thread Rosetta Code: Long List is Long

I also ran parallel using MCE; 7 workers, each processing a range of characters.

#!/usr/bin/env perl # https://perlmonks.org/?node_id=11148669 use warnings; use strict; use Judy::HS qw/ Set Get Free /; use Sort::Packed 'sort_packed'; use MCE; my $DATA_TEMPLATE = 'nZ10'; my $DATA_SIZE = 12; my $COUNT_SIZE_BYTES = 2; my $COUNT_SIZE_BITS = 16; my $COUNT_MAX = int( 2 ** $COUNT_SIZE_BITS - 1 ); @ARGV or die "usage: $0 file...\n"; my @llil_files = @ARGV; for (@llil_files) { die "Cannot open '$_'" unless -r "$_"; } # MCE gather and parallel routines. my $DATA = ''; sub gather_routine { $DATA .= $_[0]; } sub parallel_routine { my $char_range = $_; my ( $data, $current, $judy ) = ( '', 0 ); for my $fname (@llil_files) { open( my $fh, '<', $fname ) or die $!; while ( <$fh> ) { if (/^[${char_range}]/) { chomp; my ( $word, $count ) = split /\t/; ( undef, my $val ) = Get( $judy, $word ); if ( defined $val ) { vec( $data, $val * $DATA_SIZE / $COUNT_SIZE_BYTES, $COUNT_SIZE_BITS ) -= $count } else { $data .= pack $DATA_TEMPLATE, $COUNT_MAX - $count, + $word; Set( $judy, $word, $current ); $current ++ } } } close $fh; } Free( $judy ); MCE->gather( $data ); } # Run parallel using MCE. warn "my_test start\n"; my $tstart1 = time; MCE->new( input_data => ['a-d','e-h','i-l','m-p','q-t','u-x','y-z'], max_workers => 7, chunk_size => 1, posix_exit => 1, gather => \&gather_routine, user_func => \&parallel_routine, use_threads => 0, )->run(1); my $tend1 = time; warn "get_properties : ", $tend1 - $tstart1, " secs\n"; my $tstart2 = time; sort_packed "C$DATA_SIZE", $DATA; $| = 0; # enable output buffering while ( $DATA ) { my ( $count, $word ) = unpack $DATA_TEMPLATE, substr $DATA, 0, $DA +TA_SIZE, ''; printf "%s\t%d\n", $word, $COUNT_MAX - $count } my $tend2 = time; warn "sort + output : ", $tend2 - $tstart2, " secs\n"; warn "total : ", $tend2 - $tstart1, " secs\n"; __END__ $ time perl mce_judyhs.pl big1.txt big2.txt big3.txt >out3.txt my_test start get_properties : 5 secs sort + output : 5 secs total : 10 secs real 0m9.794s user 0m35.719s sys 0m0.257s

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11148680]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2024-04-25 09:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found