Re^4: Rosetta Code: Long List is Long (outgunned?!)

E-hm... hello? Are we still playing?

Long thread is long. Should have known better before even beginning to think to start mumbling "paralle..li...", because of massive barrage of fire that ensued immediately after :). Hoping I chose correct version to test, and my dated PC (number of workers in particular) is poor workbench, but:

$ time ./llil3vec_11149482 big1.txt big2.txt big3.txt >vec6.tmp
llil3vec (fixed string length=6) start
get_properties      CPU time : 1.80036 secs
emplace set sort    CPU time : 0.815786 secs
write stdout        CPU time : 1.39233 secs
total               CPU time : 4.00856 secs
total        wall clock time : 4 secs

real    0m4.464s
user    0m3.921s
sys     0m0.445s

$ time ./llil3vec_11149482_omp big1.txt big2.txt big3.txt >vec6.tmp
llil3vec (fixed string length=6) start
get_properties      CPU time : 2.06675 secs
emplace set sort    CPU time : 0.94937 secs
write stdout        CPU time : 1.40311 secs
total               CPU time : 4.41929 secs
total        wall clock time : 4 secs

real    0m3.861s
user    0m4.356s
sys     0m0.493s
[download]

----------------------------------------------

Then I sent my workers to retirement to plant or pick flowers or something i.e. (temporarily) reverted to single-threaded code, walked around (snow, no flowers), made a few changes, here's comparing previous and new versions:

$ time ../j903/bin/jconsole llil4.ijs big1.txt big2.txt big3.txt out_j
+.txt 
Read and parse input:    1.6121
Classify, sum, sort:     2.23621
Format and write output: 1.36701
Total time:              5.21532

real    0m5.220s
user    0m3.934s
sys     0m1.195s

$ time ../j903/bin/jconsole llil5.ijs big1.txt big2.txt big3.txt out_j
+.txt 
Read and parse input:    1.40811
Classify, sum, sort:     1.80736
Format and write output: 0.373946
Total time:              3.58941

real    0m3.594s
user    0m2.505s
sys     0m0.991s

$ diff vec6.tmp out_j.txt 
$
[download]

New script:

NB. -----------------------------------------------------------
NB. --- This file is "llil5.ijs"
NB. --- Run as e.g.:
NB.
NB. jconsole.exe llil5.ijs big1.txt big2.txt big3.txt out.txt
NB.
NB. --- (NOTE: last arg is output filename, file is overwritten)
NB. -----------------------------------------------------------

pattern =: 0 1

args   =: 2 }. ARGV
fn_out =: {: args
fn_in  =: }: args

filter_CR =: #~ ~: & CR

read_file =: {{
  'fname pattern' =. y

  text =. TAB, filter_CR fread fname
  text =. TAB (I. text = LF) } text

  selectors =. I. text = TAB 

  width  =. # pattern
  height =. width <. @ %~ # selectors

  append_diffs =. }: , 2& (-~/\)
  shuffle_dims =. (1 0 3 & |:) @ ((2, height, width, 1) & $)

  selectors =. append_diffs selectors
  selectors =. shuffle_dims selectors

  literal  =. < @: (}."1) @: (];. 0)        & text "_1
  numeric  =. < @: (0&".) @: (; @: (<;. 0)) & text "_1
  extract  =. pattern & {
  using    =. 1 & \
  or_maybe =. `

  ,(extract literal or_maybe numeric) using selectors
}}

read_many_files =: {{
  'fnames pattern' =. y

  ,&.>/"2 (-#pattern) ]\ ,(read_file @:(; &pattern)) "0 fnames
}}

'words nums' =: read_many_files fn_in ; pattern

t1 =: (6!:1) ''         NB. time since engine start

idx   =: i.~ words
nums  =: idx +//. nums
idx   =: nums </. ~. idx
words =: (/:~ @: { &words)&.> idx
erase < 'idx'
nums  =: ~. nums
'words nums' =: (\: nums)& { &.:>"_1 words ; nums

t2 =: (6!:1) ''         NB. time since engine start

text =: ; words (, @: (,"1 _))&.(>`a:)"_1 TAB ,. (": ,. nums) ,. LF
erase 'words' ; 'nums'
text =: (#~ ~: & ' ') text
text fwrite fn_out
erase < 'text'

t3 =: (6!:1) ''         NB. time since engine start

echo 'Read and parse input:    ' , ": t1
echo 'Classify, sum, sort:     ' , ": t2 - t1
echo 'Format and write output: ' , ": t3 - t2
echo 'Total time:              ' , ": t3
 exit 0
echo ''
echo 'Finished. Waiting for a key...'
stdin ''
exit 0
[download]

----------------------------------------------

I don't know C++ "tools" chosen above ("modules" or whatever they called) at all; is capping the length to "6" in code just matter of convenience; any longer value could be hard-coded instead, like "12" or "25" (with obvious other fixes)? I mean, no catastrophic (cubic, etc.) slow-down would happen to sorting after some threshold? Therefore forcing to comment-out the define and use alternative set of "tools"? Perhaps input would be slower if cutting to unequally long words is expected?

Anyway, here's output if the define is commented-out:

$ time ./llil3vec_11149482_no6 big1.txt big2.txt big3.txt >vec6.tmp
llil3vec start
get_properties      CPU time : 3.19387 secs
emplace set sort    CPU time : 0.996694 secs
write stdout        CPU time : 1.32918 secs
total               CPU time : 5.5198 secs
total        wall clock time : 6 secs

real    0m6.088s
user    0m5.294s
sys     0m0.701s

$ time ./llil3vec_11149482_no6_omp big1.txt big2.txt big3.txt >vec6.tm
+p
llil3vec start
get_properties      CPU time : 3.99891 secs
emplace set sort    CPU time : 1.13424 secs
write stdout        CPU time : 1.41112 secs
total               CPU time : 6.54723 secs
total        wall clock time : 4 secs

real    0m4.952s
user    0m6.207s
sys     0m0.842s
[download]

Should my time be compared to them? :) (Blimey, my solution doesn't have to compete when participants are capped selectively (/grumpy_on around here)). Or I can use powerful magic secret turbo mode:

turbo_mode_ON =: {{
  assert. 0 <: c =. 8 - {: $y
  h =. (3 (3!:4) 16be2), ,|."1 [3 (3!:4)"0 (4:,#,1:,#) y
  3!:2 h, ,y ,"1 _ c # ' '
}}

turbo_mode_OFF =: {{
  (5& }. @: (_8& (]\)) @: (2& (3!:1))) &.> y
}}
[download]

Inject these definitions, and these couple lines immediately after t1 =: and before t2 =: respectively:

words =: turbo_mode_ON words
words =: turbo_mode_OFF words
[download]

Aha:

$ time ../j903/bin/jconsole llil5.ijs big1.txt big2.txt big3.txt out_j
+.txt 
Read and parse input:    1.40766
Classify, sum, sort:     1.24098
Format and write output: 0.455868
Total time:              3.1045

real    0m3.109s
user    0m1.815s
sys     0m1.210s
[download]

(and no cutting to pieces of pre-defined equal length was used ...yet) :)

----------------------------------------------

I can revert to parallel reading/parsing anytime, with effect as shown in parent node. As implemented, it was kind of passive; but files can be unequal sizes, or just one huge single file. I think serious solution would probe inside to find newlines at approx. addresses, then pass chunks coords to workers to parse in parallel.

Puny 2-workers attempt to sort, in parent, was just kind of #pragma omp parallel sections... thing with 2 sections; no use to send bus-loads of workers and expect quiet fans. There's some hope for "parallelizable primitives" in release (not beta) 9.04 or later. Maybe it's long time to wait. Or, if I could write code to merge 2 sorted arrays faster than built-in primitive sorts any of the halves -- then, bingo, I have multi-threaded fast merge-sort. But no success yet, the built-in sorts one large array faster, in single-thread.

Comment on Re^4: Rosetta Code: Long List is Long (outgunned?!) Select or Download Code

Replies are listed 'Best First'.

Re^5: Rosetta Code: Long List is Long (llil5.ijs vs llil4vec.cpp)
by marioroy (Prior) on Jan 18, 2023 at 16:47 UTC

What a delight for our Anonymonk friend to come back. Thanks to you, we tried parallel :).

... but files can be unequal sizes, or just one huge single file. I think serious solution would probe inside to find newlines at approx. addresses, then pass chunks coords to workers to parse in parallel.

Chuma mentions 2,064 input files in the initial "Long list is long" thread. Processing a list of files in parallel is suited for this use case due to many files. Back in 2014, I wrote utilities that support both chunking and list modes; mce_grep and egrep.pl via --chunk-level={auto|file|list}.

llil5p.ijs

I took llil5.ijs and created a parallel version named llil5p.ijs, based on code-bits from your prior post. The number of threads can be specified via the NUM_THREADS environment variable.

$ diff -u llil5.ijs llil5p.ijs 
--- llil5.ijs    2023-01-18 09:25:14.041515970 -0600
+++ llil5p.ijs    2023-01-18 09:25:58.889669110 -0600
@@ -9,6 +9,12 @@
 
 pattern =: 0 1
 
+nthrs =: 2!:5 'NUM_THREADS'           NB. get_env NUM_THREADS
+{{
+  if. nthrs do. nthrs =: ".nthrs end. NB. string to integer conversio
+n
+  for. i. nthrs do. 0 T. 0 end.       NB. spin nthrs
+}} ''
+
 args   =: 2 }. ARGV
 fn_out =: {: args
 fn_in  =: }: args
@@ -44,7 +50,7 @@
 read_many_files =: {{
   'fnames pattern' =. y
 
-  ,&.>/"2 (-#pattern) ]\ ,(read_file @:(; &pattern)) "0 fnames
+  ,&.>/"2 (-#pattern) ]\ ,;(read_file @:(; &pattern)) t.'' "0 fnames
 }}
 
 'words nums' =: read_many_files fn_in ; pattern
[download]

llil5tp.ijs

Next, I applied the turbo update to the parallel version and named it llil5tp.ijs.

$ diff -u llil5p.ijs llil5tp.ijs 
--- llil5p.ijs    2023-01-18 09:25:58.889669110 -0600
+++ llil5tp.ijs    2023-01-18 09:26:01.553736512 -0600
@@ -21,6 +21,16 @@
 
 filter_CR =: #~ ~: & CR
 
+turbo_mode_ON =: {{
+  assert. 0 <: c =. 8 - {: $y
+  h =. (3 (3!:4) 16be2), ,|."1 [3 (3!:4)"0 (4:,#,1:,#) y
+  3!:2 h, ,y ,"1 _ c # ' '
+}}
+
+turbo_mode_OFF =: {{
+  (5& }. @: (_8& (]\)) @: (2& (3!:1))) &.> y
+}}
+
 read_file =: {{
   'fname pattern' =. y
 
@@ -56,6 +66,7 @@
 'words nums' =: read_many_files fn_in ; pattern
 
 t1 =: (6!:1) ''         NB. time since engine start
+words =: turbo_mode_ON words
 
 idx   =: i.~ words
 nums  =: idx +//. nums
@@ -65,6 +76,7 @@
 nums  =: ~. nums
 'words nums' =: (\: nums)& { &.:>"_1 words ; nums
 
+words =: turbo_mode_OFF words
 t2 =: (6!:1) ''         NB. time since engine start
 
 text =: ; words (, @: (,"1 _))&.(>`a:)"_1 TAB ,. (": ,. nums) ,. LF
[download]

[reply]
[d/l]
[select]


Perl: the Markov chain saw
	PerlMonks