Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Please feel free to respond away with solutions...

Short version: J script runs in ~5.7 sec using ~650 MB to produce exactly same output. Which makes it fastest of all solutions so far.

(The caveat: "words" are supposed to be not too different in length, they are temporarily padded to longest during program run (i.e. not padded at all with current test sample), expect deterioration if, say, their lengths differ 2 or more orders of magnitude or so, especially if just a few are very long. I didn't check.)

###################### (Prose and previous results, can be skipped)

I was shocked to notice this thread is almost a month old already. While I'm in no hurry and have been pursuing what follows at leisure and only rarely (kind of "for dessert"), it's better to publish this at long last, be it "final" optimized version or not (I'm sure it can be improved a lot), before the thread is dead cold and whoever participated have to make effort to read their own code because of time elapsed.

As a reference frame, here are assortment of results of previous solutions, with my hardware:

llil2grt.pl (see 11148713) (fastest "pure core-only Perl"), Windows:

llil2grt start get_properties : 16 secs sort + output : 31 secs total : 47 secs Peak working set (memory): 2,744,200K

Same code & data & PC, Linux: (I didn't investigate why such difference.)

llil2grt start get_properties : 11 secs sort + output : 20 secs total : 31 secs 2,152,848 Kbytes of RAM were used

I assume that Judy (see 11148585) is best in both speed and memory for non-parallel Perl solutions, with same caveat as at the very top: "words" are temporarily padded to fixed length e.g. here to 10 bytes.:

my_Judy_test start get_properties : 13 secs sort + output : 7 secs total : 20 secs 349,124 Kbytes of RAM were used

Being lazy bum, I didn't compile C++ solutions (nor do I code in C++), here is a copy-paste from 11148969, I assume it is the best result so far, among C++ and all others: (For my PC, I expect time to be worse.)

llil2grt start get_properties CPU time : 4.252 secs emplace set sort CPU time : 1.282 secs write stdout CPU time : 1.716 secs total CPU time : 7.254 secs total wall clock time : 7 secs Memory use (Windows Private Bytes): 1,626,728K

###################### (End of prose and previous results)

Code below generates next message with RAM usage taken from Windows Task Manager (to be on par with how it was measured for Perl), while script pauses for a key (Ctrl-D or Ctrl-Z + Enter combo as usual for Linux or Windows, respectively) after finish:

Read and parse input: 1.636 Classify, sum, sort: 2.206 Format and write output: 1.895 Total time: 5.737 Finished. Waiting for a key... Peak working set (memory): 657,792K

The injection of CR into output lines is only required on Windows (actually, not required at all) to later ensure no difference with output from Perl. The "magic constant" 3 for number width can be any, and is only used for intermediate step.

I had to make this code slightly less readable than it was during development, by somewhat aggressively re-using over and over again same variable names for words and nums, as data are processed and modified as script progresses. They were different "self-explanatory" names at each stage, but because arrays are huge, it's better to immediately over-write variable on successive assignments to conserve memory. "Erasing" throw-away helper arrays (similar to undef in Perl) serves same purpose.

Actually, during development I was playing with this toy dataset, here's original data and result:

   text =: noun define
tango	1
charlie	2
bravo	1
foxtrot	4
alpha	3
foxtrot	1
bravo	1
foxtrot	7
)

    NB. Do work here...
    
    ] text
foxtrot	12
alpha	3
bravo	2
charlie	2
tango	1

The script:

NB. ----------------------------------------------------------- NB. --- This file is "llil.ijs" NB. --- Run as e.g.: NB. NB. jconsole.exe llil.ijs big1.txt big2.txt big3.txt out.txt NB. NB. --- (NOTE: last arg is output filename, file is overwritten) NB. ----------------------------------------------------------- args =: 2 }. ARGV fn_out =: {: args fn_in =: }: args NUM_LENGTH =: 3 PAD_CHAR =: ' ' make_sel =: [: (1 2 0& |:) @ ,: ([ ,. ] - [: , [) sort_some =: ([: /:~ {)`[`] } text =: , freads " 0 fn_in lf_pos =: I. text = LF tab_pos =: I. text = TAB words =: ((0 , >: }: lf_pos) make_sel tab_pos) ];.0 text nums =: 0&". (tab_pos make_sel lf_pos) ; @: (<;.0) text erase 'text' ; 'lf_pos' ; 'tab_pos' t1 =: (6!:1) '' NB. time since engine start nums =: words +//. nums words =: ~. words 'words nums' =: (\: nums)& { &.:>"_1 words ; nums starts =: I. ~: nums ranges =: starts ,. (}. starts , # nums) - starts count =: # starts sort_words =: monad define 'ranges words' =. y range =. ({. + i. @ {:) {. ranges (}. ranges) ; range sort_some words ) words =: > {: sort_words ^: count ranges ; words erase 'starts' ; 'ranges' t2 =: (6!:1) '' NB. time since engine start nums =: (- NUM_LENGTH) ]\ NUM_LENGTH ": nums text =: , words ,. TAB ,. (nums ,. CR) ,. LF erase 'words' ; 'nums' text =: (#~ ~: & PAD_CHAR) text text fwrite fn_out erase < 'text' t3 =: (6!:1) '' NB. time since engine start echo 'Read and parse input: ' , ": t1 echo 'Classify, sum, sort: ' , ": t2 - t1 echo 'Format and write output: ' , ": t3 - t2 echo 'Total time: ' , ": t3 echo '' echo 'Finished. Waiting for a key...' stdin '' exit 0

In reply to Re: Rosetta Code: Long List is Long (faster) by Anonymous Monk
in thread Rosetta Code: Long List is Long by eyepopslikeamosquito

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-04-24 23:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found