Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^4: Rosetta Code: Long List is Long (outgunned?!)

by Anonymous Monk
on Jan 17, 2023 at 18:44 UTC ( [id://11149651]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Rosetta Code: Long List is Long (faster - vec)(faster++, and now parallel)
in thread Rosetta Code: Long List is Long

E-hm... hello? Are we still playing?

Long thread is long. Should have known better before even beginning to think to start mumbling "paralle..li...", because of massive barrage of fire that ensued immediately after :). Hoping I chose correct version to test, and my dated PC (number of workers in particular) is poor workbench, but:

$ time ./llil3vec_11149482 big1.txt big2.txt big3.txt >vec6.tmp llil3vec (fixed string length=6) start get_properties CPU time : 1.80036 secs emplace set sort CPU time : 0.815786 secs write stdout CPU time : 1.39233 secs total CPU time : 4.00856 secs total wall clock time : 4 secs real 0m4.464s user 0m3.921s sys 0m0.445s $ time ./llil3vec_11149482_omp big1.txt big2.txt big3.txt >vec6.tmp llil3vec (fixed string length=6) start get_properties CPU time : 2.06675 secs emplace set sort CPU time : 0.94937 secs write stdout CPU time : 1.40311 secs total CPU time : 4.41929 secs total wall clock time : 4 secs real 0m3.861s user 0m4.356s sys 0m0.493s

----------------------------------------------

Then I sent my workers to retirement to plant or pick flowers or something i.e. (temporarily) reverted to single-threaded code, walked around (snow, no flowers), made a few changes, here's comparing previous and new versions:

$ time ../j903/bin/jconsole llil4.ijs big1.txt big2.txt big3.txt out_j +.txt Read and parse input: 1.6121 Classify, sum, sort: 2.23621 Format and write output: 1.36701 Total time: 5.21532 real 0m5.220s user 0m3.934s sys 0m1.195s $ time ../j903/bin/jconsole llil5.ijs big1.txt big2.txt big3.txt out_j +.txt Read and parse input: 1.40811 Classify, sum, sort: 1.80736 Format and write output: 0.373946 Total time: 3.58941 real 0m3.594s user 0m2.505s sys 0m0.991s $ diff vec6.tmp out_j.txt $

New script:

NB. ----------------------------------------------------------- NB. --- This file is "llil5.ijs" NB. --- Run as e.g.: NB. NB. jconsole.exe llil5.ijs big1.txt big2.txt big3.txt out.txt NB. NB. --- (NOTE: last arg is output filename, file is overwritten) NB. ----------------------------------------------------------- pattern =: 0 1 args =: 2 }. ARGV fn_out =: {: args fn_in =: }: args filter_CR =: #~ ~: & CR read_file =: {{ 'fname pattern' =. y text =. TAB, filter_CR fread fname text =. TAB (I. text = LF) } text selectors =. I. text = TAB width =. # pattern height =. width <. @ %~ # selectors append_diffs =. }: , 2& (-~/\) shuffle_dims =. (1 0 3 & |:) @ ((2, height, width, 1) & $) selectors =. append_diffs selectors selectors =. shuffle_dims selectors literal =. < @: (}."1) @: (];. 0) & text "_1 numeric =. < @: (0&".) @: (; @: (<;. 0)) & text "_1 extract =. pattern & { using =. 1 & \ or_maybe =. ` ,(extract literal or_maybe numeric) using selectors }} read_many_files =: {{ 'fnames pattern' =. y ,&.>/"2 (-#pattern) ]\ ,(read_file @:(; &pattern)) "0 fnames }} 'words nums' =: read_many_files fn_in ; pattern t1 =: (6!:1) '' NB. time since engine start idx =: i.~ words nums =: idx +//. nums idx =: nums </. ~. idx words =: (/:~ @: { &words)&.> idx erase < 'idx' nums =: ~. nums 'words nums' =: (\: nums)& { &.:>"_1 words ; nums t2 =: (6!:1) '' NB. time since engine start text =: ; words (, @: (,"1 _))&.(>`a:)"_1 TAB ,. (": ,. nums) ,. LF erase 'words' ; 'nums' text =: (#~ ~: & ' ') text text fwrite fn_out erase < 'text' t3 =: (6!:1) '' NB. time since engine start echo 'Read and parse input: ' , ": t1 echo 'Classify, sum, sort: ' , ": t2 - t1 echo 'Format and write output: ' , ": t3 - t2 echo 'Total time: ' , ": t3 exit 0 echo '' echo 'Finished. Waiting for a key...' stdin '' exit 0

----------------------------------------------

I don't know C++ "tools" chosen above ("modules" or whatever they called) at all; is capping the length to "6" in code just matter of convenience; any longer value could be hard-coded instead, like "12" or "25" (with obvious other fixes)? I mean, no catastrophic (cubic, etc.) slow-down would happen to sorting after some threshold? Therefore forcing to comment-out the define and use alternative set of "tools"? Perhaps input would be slower if cutting to unequally long words is expected?

Anyway, here's output if the define is commented-out:

$ time ./llil3vec_11149482_no6 big1.txt big2.txt big3.txt >vec6.tmp llil3vec start get_properties CPU time : 3.19387 secs emplace set sort CPU time : 0.996694 secs write stdout CPU time : 1.32918 secs total CPU time : 5.5198 secs total wall clock time : 6 secs real 0m6.088s user 0m5.294s sys 0m0.701s $ time ./llil3vec_11149482_no6_omp big1.txt big2.txt big3.txt >vec6.tm +p llil3vec start get_properties CPU time : 3.99891 secs emplace set sort CPU time : 1.13424 secs write stdout CPU time : 1.41112 secs total CPU time : 6.54723 secs total wall clock time : 4 secs real 0m4.952s user 0m6.207s sys 0m0.842s

Should my time be compared to them? :) (Blimey, my solution doesn't have to compete when participants are capped selectively (/grumpy_on around here)). Or I can use powerful magic secret turbo mode:

turbo_mode_ON =: {{ assert. 0 <: c =. 8 - {: $y h =. (3 (3!:4) 16be2), ,|."1 [3 (3!:4)"0 (4:,#,1:,#) y 3!:2 h, ,y ,"1 _ c # ' ' }} turbo_mode_OFF =: {{ (5& }. @: (_8& (]\)) @: (2& (3!:1))) &.> y }}

Inject these definitions, and these couple lines immediately after t1 =: and before t2 =: respectively:

words =: turbo_mode_ON words words =: turbo_mode_OFF words

Aha:

$ time ../j903/bin/jconsole llil5.ijs big1.txt big2.txt big3.txt out_j +.txt Read and parse input: 1.40766 Classify, sum, sort: 1.24098 Format and write output: 0.455868 Total time: 3.1045 real 0m3.109s user 0m1.815s sys 0m1.210s

(and no cutting to pieces of pre-defined equal length was used ...yet) :)

----------------------------------------------

I can revert to parallel reading/parsing anytime, with effect as shown in parent node. As implemented, it was kind of passive; but files can be unequal sizes, or just one huge single file. I think serious solution would probe inside to find newlines at approx. addresses, then pass chunks coords to workers to parse in parallel.

Puny 2-workers attempt to sort, in parent, was just kind of #pragma omp parallel sections... thing with 2 sections; no use to send bus-loads of workers and expect quiet fans. There's some hope for "parallelizable primitives" in release (not beta) 9.04 or later. Maybe it's long time to wait. Or, if I could write code to merge 2 sorted arrays faster than built-in primitive sorts any of the halves -- then, bingo, I have multi-threaded fast merge-sort. But no success yet, the built-in sorts one large array faster, in single-thread.

Replies are listed 'Best First'.
Re^5: Rosetta Code: Long List is Long (llil5.ijs vs llil4vec.cpp)
by marioroy (Prior) on Jan 18, 2023 at 16:47 UTC

    What a delight for our Anonymonk friend to come back. Thanks to you, we tried parallel :).

    ... but files can be unequal sizes, or just one huge single file. I think serious solution would probe inside to find newlines at approx. addresses, then pass chunks coords to workers to parse in parallel.

    Chuma mentions 2,064 input files in the initial "Long list is long" thread. Processing a list of files in parallel is suited for this use case due to many files. Back in 2014, I wrote utilities that support both chunking and list modes; mce_grep and egrep.pl via --chunk-level={auto|file|list}.

    llil5p.ijs

    I took llil5.ijs and created a parallel version named llil5p.ijs, based on code-bits from your prior post. The number of threads can be specified via the NUM_THREADS environment variable.

    $ diff -u llil5.ijs llil5p.ijs --- llil5.ijs 2023-01-18 09:25:14.041515970 -0600 +++ llil5p.ijs 2023-01-18 09:25:58.889669110 -0600 @@ -9,6 +9,12 @@ pattern =: 0 1 +nthrs =: 2!:5 'NUM_THREADS' NB. get_env NUM_THREADS +{{ + if. nthrs do. nthrs =: ".nthrs end. NB. string to integer conversio +n + for. i. nthrs do. 0 T. 0 end. NB. spin nthrs +}} '' + args =: 2 }. ARGV fn_out =: {: args fn_in =: }: args @@ -44,7 +50,7 @@ read_many_files =: {{ 'fnames pattern' =. y - ,&.>/"2 (-#pattern) ]\ ,(read_file @:(; &pattern)) "0 fnames + ,&.>/"2 (-#pattern) ]\ ,;(read_file @:(; &pattern)) t.'' "0 fnames }} 'words nums' =: read_many_files fn_in ; pattern

    llil5tp.ijs

    Next, I applied the turbo update to the parallel version and named it llil5tp.ijs.

    $ diff -u llil5p.ijs llil5tp.ijs --- llil5p.ijs 2023-01-18 09:25:58.889669110 -0600 +++ llil5tp.ijs 2023-01-18 09:26:01.553736512 -0600 @@ -21,6 +21,16 @@ filter_CR =: #~ ~: & CR +turbo_mode_ON =: {{ + assert. 0 <: c =. 8 - {: $y + h =. (3 (3!:4) 16be2), ,|."1 [3 (3!:4)"0 (4:,#,1:,#) y + 3!:2 h, ,y ,"1 _ c # ' ' +}} + +turbo_mode_OFF =: {{ + (5& }. @: (_8& (]\)) @: (2& (3!:1))) &.> y +}} + read_file =: {{ 'fname pattern' =. y @@ -56,6 +66,7 @@ 'words nums' =: read_many_files fn_in ; pattern t1 =: (6!:1) '' NB. time since engine start +words =: turbo_mode_ON words idx =: i.~ words nums =: idx +//. nums @@ -65,6 +76,7 @@ nums =: ~. nums 'words nums' =: (\: nums)& { &.:>"_1 words ; nums +words =: turbo_mode_OFF words t2 =: (6!:1) '' NB. time since engine start text =: ; words (, @: (,"1 _))&.(>`a:)"_1 TAB ,. (": ,. nums) ,. LF

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11149651]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (1)
As of 2024-04-25 00:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found