Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^3: Rosetta Code: Long List is Long -- Python

by parv (Parson)
on Dec 10, 2022 at 04:51 UTC ( [id://11148702]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Rosetta Code: Long List is Long -- Python
in thread Rosetta Code: Long List is Long

In short, could you do another Python, Pyston* run with sort function replaced with ...

def sort_native( cat_count ): once = sorted( cat_count.keys() ) return sorted( once, key = lambda k: cat_count[ k ], reverse = True + )

...?

In long, before eyepopslikeamosquito posted ...

sort { $href->{$b} <=> $href->{$a} } sort keys %{$href}

... I had missed to notice the sorting order. I decided to do the same for Python (native) version instead of using functools.cmp_to_key function. I also realized that I was doing the sorting other way (sorting keys by value, followed by sorting by key), so was not getting the expected output (which made me to use cmp_to_key instead).

Replacing sort_via_cmp_to_key with ...

def sort_native( cat_count :dict[ str, int ] ) ->list[ str ]: """ Returns: A `list` of sorted keys by decreasing order of values & increasi +ng order of keys. Args: cat_count: A `dict` with string key & integer value. """ once = sorted( cat_count.keys() ) return sorted( once, key = lambda k: cat_count[ k ], reverse = True + )

... reduces the sort time by ~10 times (~11-16 s; also produces the expected output) in my run environment.

time passes, so slowly🎶 ... Putting the complete program here (~28-35 s) ...

#!/usr/local/bin/python3.10 # Source: https://perlmonks.org/index.pl?node_id=11148702 # # This is one Python implementation based on the problem specification + ... # # Rosetta Code: Long List is Long, 20221130, # by eyepopslikeamosquito # https://perlmonks.org/?node_id=11148465 import sys from collections import defaultdict from hashlib import sha256 from pathlib import PosixPath from timeit import default_timer # Takes ~0.3 s. def verify( output :str, # The digest of the output produced from eyepopslikeamosqu +ito's # program, "gen-llil.pl", after generating 3 input files. +Update it # as needed. expected_sum :str = '70a1644743e9b9d8d73094ed1826527f27a7f3f131c3f28a63aaeb +85e1af8fef' ) ->None: """ Prints a message about output being different than expected. Args: output : Stringified output of sorted input. expected_sum: SHA-256 digest sum in hexadecimal of expected ASCI +I output. """ # Need to encode the resulting string with encoding of "ascii" to m +atch up # of the input. # Also make sure that each category-count pair ends with a newline. sum = sha256( output.encode( encoding = 'ascii' ) ).hexdigest() if sum != expected_sum: sys.stderr.write( f"OUTPUT is DIFFERENT!\n {sum}\n" ) return # Takes ~7-11 s. def sort_val_desc_key_asc( cat_count :dict[ str, int ] ) ->list[ str ] +: """ Returns: A `list` of sorted keys by decreasing order of values & increasi +ng order of keys. Args: cat_count: A `dict` with string key & integer value. """ once = sorted( cat_count.keys() ) return sorted( once, key = lambda k: cat_count[ k ], reverse = True + ) # Takes ~12-16 s. def collect( data_list :list ) ->dict[ str, int ]: """ Returns: A `dict` with category as key & total count as value. Args: data_list: list of file paths as `pathlib.PosixPath` objects. Side effect: Updates `time_stat` with the time taken to collect the data. """ cat_count = defaultdict( lambda: 0 ) delimiter = '\t' open_prop = { 'mode' : 'rt', 'encoding' : 'ascii', 'newline' : '\n' } for path in data_list: with path.open( **open_prop ) as fh: for line in fh: category, number = line.split( delimiter, 1 ) cat_count[ category ] += int( number ) return cat_count # Collect file paths. if sys.argv[1:]: data_list = [ PosixPath( p ) for p in sys.argv[1:] ] else: sys.exit( 'Give a file list with data to process' ) start = default_timer() # Process. cat_count = collect( data_list ) end_collect = default_timer() # Sort. sorted_key = sort_val_desc_key_asc( cat_count ) end_sort = default_timer() # Format; take ~7 s. stringified = ''.join( f'{k}\t{cat_count[ k ]}\n' for k in sorted_key +) end_stringification = default_timer() # Either verify or print to verifiy outside of the program. # Verification is slightly slower than dumping string on standard outp +ut. verify_or_print = 'NOT verify' if verify_or_print == 'verify': output_label = 'sha256 verification' verify( stringified ) else: output_label = 'output' print( stringified, end = '' ) end = default_timer() # Print time taken. stat = { 'collect' : end_collect - start, 'sort' : end_sort - end_collect, 'stringification' : end_stringification - end_sort, output_label : end - end_stringification, # ~28-36 s. 'total' : end - start } max_label_width = max( len( k ) for k in stat.keys() ) # decimal_place = 1 max_time_width = max( len( f'{t:0.{decimal_place}f}' ) for t in stat.v +alues() ) time_format = f'{max_time_width}.{decimal_place}f' # out_format = '{label:{label_pad}}: {time:{time_format}} s\n' # sys.stderr.write( f'# {__file__}\n' ) for step, time in stat.items(): mess = out_format.format( label = step, label_pad = max_label_width +, time = time, time_format = time_format ) sys.stderr.write( mess )
# Time.
# parv-20221210-two-sorts.py collect : 15.9 s sort : 7.8 s stringification: 7.5 s output : 0.1 s total : 31.3 s

Replies are listed 'Best First'.
Re^4: Rosetta Code: Long List is Long -- Python
by marioroy (Prior) on Dec 10, 2022 at 07:05 UTC
    > In short, could you do another Python, Pyston* run with sort function replaced with ...

    First, I made the following modification to resolve "TypeError: 'type' object is not subscriptable" using Pyston 2.3.5.

    $ diff llil1.py llil2.py 53c53 < def collect( data_list :list ) ->dict[ str, int ]: --- > def collect( data_list :list ) ->dict: 87c87 < def process( cat_count :dict[ str, int ] ) ->Generator[ str, None, N +one ]: --- > def process( cat_count :dict ) ->Generator: 110c110 < def sort_native( cat_count :dict[ str, int ] ) ->list[ str ]: --- > def sort_native( cat_count :dict ) ->list:

    Python 3.10.7:

    # I have pyston_lite_autoload installed already. It can be disabled. $ DISABLE_PYSTON=1 python3 llil2.py big1.txt big2.txt big3.txt > out1. +txt collect time : 5.1 s sort+format time : 3.6 s total processing time: 8.7 s output time : 3.6 s total time : 12.2 s

    Python with Pyston-lite loaded automatically:

    # Install pyston_lite_autoload to ~/.local/lib/python3.10/site-package +s/. $ python3 -m pip install --user pyston_lite_autoload $ python3 llil2.py big1.txt big2.txt big3.txt > out2.txt collect time : 4.3 s sort+format time : 3.5 s total processing time: 7.9 s output time : 3.3 s total time : 11.2 s

    Pyston 2.3.5:

    $ ./pyston_2.3.5/bin/pyston3 llil2.py big1.txt big2.txt big3.txt > out +3.txt collect time : 3.7 s sort+format time : 3.4 s total processing time: 7.1 s output time : 3.0 s total time : 10.1 s

    Pyston makes it all the more fun.

      Neat, sorting is now ~4-5 times faster there. Thanks for the re-run.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11148702]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (6)
As of 2024-03-29 14:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found