Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^5: Rosetta Code: Long List is Long (llil5.ijs vs llil4vec.cpp)

by marioroy (Prior)
on Jan 18, 2023 at 16:47 UTC ( [id://11149672]=note: print w/replies, xml ) Need Help??


in reply to Re^4: Rosetta Code: Long List is Long (outgunned?!)
in thread Rosetta Code: Long List is Long

What a delight for our Anonymonk friend to come back. Thanks to you, we tried parallel :).

... but files can be unequal sizes, or just one huge single file. I think serious solution would probe inside to find newlines at approx. addresses, then pass chunks coords to workers to parse in parallel.

Chuma mentions 2,064 input files in the initial "Long list is long" thread. Processing a list of files in parallel is suited for this use case due to many files. Back in 2014, I wrote utilities that support both chunking and list modes; mce_grep and egrep.pl via --chunk-level={auto|file|list}.

llil5p.ijs

I took llil5.ijs and created a parallel version named llil5p.ijs, based on code-bits from your prior post. The number of threads can be specified via the NUM_THREADS environment variable.

$ diff -u llil5.ijs llil5p.ijs --- llil5.ijs 2023-01-18 09:25:14.041515970 -0600 +++ llil5p.ijs 2023-01-18 09:25:58.889669110 -0600 @@ -9,6 +9,12 @@ pattern =: 0 1 +nthrs =: 2!:5 'NUM_THREADS' NB. get_env NUM_THREADS +{{ + if. nthrs do. nthrs =: ".nthrs end. NB. string to integer conversio +n + for. i. nthrs do. 0 T. 0 end. NB. spin nthrs +}} '' + args =: 2 }. ARGV fn_out =: {: args fn_in =: }: args @@ -44,7 +50,7 @@ read_many_files =: {{ 'fnames pattern' =. y - ,&.>/"2 (-#pattern) ]\ ,(read_file @:(; &pattern)) "0 fnames + ,&.>/"2 (-#pattern) ]\ ,;(read_file @:(; &pattern)) t.'' "0 fnames }} 'words nums' =: read_many_files fn_in ; pattern

llil5tp.ijs

Next, I applied the turbo update to the parallel version and named it llil5tp.ijs.

$ diff -u llil5p.ijs llil5tp.ijs --- llil5p.ijs 2023-01-18 09:25:58.889669110 -0600 +++ llil5tp.ijs 2023-01-18 09:26:01.553736512 -0600 @@ -21,6 +21,16 @@ filter_CR =: #~ ~: & CR +turbo_mode_ON =: {{ + assert. 0 <: c =. 8 - {: $y + h =. (3 (3!:4) 16be2), ,|."1 [3 (3!:4)"0 (4:,#,1:,#) y + 3!:2 h, ,y ,"1 _ c # ' ' +}} + +turbo_mode_OFF =: {{ + (5& }. @: (_8& (]\)) @: (2& (3!:1))) &.> y +}} + read_file =: {{ 'fname pattern' =. y @@ -56,6 +66,7 @@ 'words nums' =: read_many_files fn_in ; pattern t1 =: (6!:1) '' NB. time since engine start +words =: turbo_mode_ON words idx =: i.~ words nums =: idx +//. nums @@ -65,6 +76,7 @@ nums =: ~. nums 'words nums' =: (\: nums)& { &.:>"_1 words ; nums +words =: turbo_mode_OFF words t2 =: (6!:1) '' NB. time since engine start text =: ; words (, @: (,"1 _))&.(>`a:)"_1 TAB ,. (": ,. nums) ,. LF

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11149672]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-04-18 05:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found