Re^5: Rosetta Code: Long List is Long (llil5.ijs vs llil4vec.cpp)

What a delight for our Anonymonk friend to come back. Thanks to you, we tried parallel :).

... but files can be unequal sizes, or just one huge single file. I think serious solution would probe inside to find newlines at approx. addresses, then pass chunks coords to workers to parse in parallel.

Chuma mentions 2,064 input files in the initial "Long list is long" thread. Processing a list of files in parallel is suited for this use case due to many files. Back in 2014, I wrote utilities that support both chunking and list modes; mce_grep and egrep.pl via --chunk-level={auto|file|list}.

llil5p.ijs

I took llil5.ijs and created a parallel version named llil5p.ijs, based on code-bits from your prior post. The number of threads can be specified via the NUM_THREADS environment variable.

$ diff -u llil5.ijs llil5p.ijs 
--- llil5.ijs    2023-01-18 09:25:14.041515970 -0600
+++ llil5p.ijs    2023-01-18 09:25:58.889669110 -0600
@@ -9,6 +9,12 @@
 
 pattern =: 0 1
 
+nthrs =: 2!:5 'NUM_THREADS'           NB. get_env NUM_THREADS
+{{
+  if. nthrs do. nthrs =: ".nthrs end. NB. string to integer conversio
+n
+  for. i. nthrs do. 0 T. 0 end.       NB. spin nthrs
+}} ''
+
 args   =: 2 }. ARGV
 fn_out =: {: args
 fn_in  =: }: args
@@ -44,7 +50,7 @@
 read_many_files =: {{
   'fnames pattern' =. y
 
-  ,&.>/"2 (-#pattern) ]\ ,(read_file @:(; &pattern)) "0 fnames
+  ,&.>/"2 (-#pattern) ]\ ,;(read_file @:(; &pattern)) t.'' "0 fnames
 }}
 
 'words nums' =: read_many_files fn_in ; pattern
[download]

llil5tp.ijs

Next, I applied the turbo update to the parallel version and named it llil5tp.ijs.

$ diff -u llil5p.ijs llil5tp.ijs 
--- llil5p.ijs    2023-01-18 09:25:58.889669110 -0600
+++ llil5tp.ijs    2023-01-18 09:26:01.553736512 -0600
@@ -21,6 +21,16 @@
 
 filter_CR =: #~ ~: & CR
 
+turbo_mode_ON =: {{
+  assert. 0 <: c =. 8 - {: $y
+  h =. (3 (3!:4) 16be2), ,|."1 [3 (3!:4)"0 (4:,#,1:,#) y
+  3!:2 h, ,y ,"1 _ c # ' '
+}}
+
+turbo_mode_OFF =: {{
+  (5& }. @: (_8& (]\)) @: (2& (3!:1))) &.> y
+}}
+
 read_file =: {{
   'fname pattern' =. y
 
@@ -56,6 +66,7 @@
 'words nums' =: read_many_files fn_in ; pattern
 
 t1 =: (6!:1) ''         NB. time since engine start
+words =: turbo_mode_ON words
 
 idx   =: i.~ words
 nums  =: idx +//. nums
@@ -65,6 +76,7 @@
 nums  =: ~. nums
 'words nums' =: (\: nums)& { &.:>"_1 words ; nums
 
+words =: turbo_mode_OFF words
 t2 =: (6!:1) ''         NB. time since engine start
 
 text =: ; words (, @: (,"1 _))&.(>`a:)"_1 TAB ,. (": ,. nums) ,. LF
[download]

Comment on Re^5: Rosetta Code: Long List is Long (llil5.ijs vs llil4vec.cpp) Select or Download Code


The stupid question is the question not asked
	PerlMonks