Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^3: How can I do a numeric sort on a substring? [Benchmark]

by kcott (Archbishop)
on Jun 27, 2021 at 02:41 UTC ( [id://11134361]=note: print w/replies, xml ) Need Help??


in reply to Re^2: How can I do a numeric sort on a substring? [Benchmark]
in thread How can I do a numeric sort on a substring?

G'day Rolf,

"you are only testing with 10 elements in your @unordered array."

The code only contains one comment which I added for the express purpose of heading off such feedback:

# Extend @unordered for improved benchmarking push @unordered, map "a-$_", shuffle 0..10000;
"use at least n >> 1000 elements for a real benchmark"

Indeed. I used 10,011 elements "for improved benchmarking".

The preamble tests were to check that all subroutines returned identical results. Each of the preamble tests were only run once and did not involve any benchmarking; they were intended as a sanity check (processing stopped here if --dry_run was used). The data used for this had 10 elements, which I considered sufficient for these tests, and was indicated by:

Unordered data (for preamble tests):

— Ken

Replies are listed 'Best First'.
Re^4: How can I do a numeric sort on a substring? [Benchmark]
by LanX (Saint) on Jun 27, 2021 at 11:56 UTC
    Hi Ken

    > I added for the express purpose of heading off such feedback

    Oops, sorry I missed that. :/

    I can't dig deeper now, but some suggestions, to solve the miracle

    • it's still logarithmic, maybe try 1e6 elements instead
    • Perl is propagating the caller's context to the returning statement, maybe you should check if benchmark is using list context too?

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      > Perl is propagating the caller's context to the returning statement, maybe you should check if benchmark is using list context too?

      my suspicion was justified, the benchmarks are in void context, that's why simple sorts are just doing nothing. ( and nothing is fast ;)

      I took your code and forced all subs to operate in list context, by prepending @ordered = in the first line.

      That's the result with 10000 elements (you can also adjust $max for more or less elements)

      Perl & OS: v5.32.1 on MSWin32 Unordered data (for preamble tests): a-10 a-01 a-22 a-2 a-0 a-3 a-000 a-1 a-12345 a-1 Preamble tests: grt_pack_expr: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 grt_pack_expr_q: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_anchored: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_anch_expr_ni: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_anch_ni: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_expr_ni: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_no_index: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 map_cat_substr: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 map_cat_substr_len: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_pack: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_regex: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_regex_anchored: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_substr: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 Legend: GRTpe: grt_pack_expr GRTpeq: grt_pack_expr_q STr: st_regex STra: st_regex_anchored STraen: st_regex_anch_expr_ni STran: st_regex_anch_ni STren: st_regex_expr_ni STrn: st_regex_no_index mcs: map_cat_substr mcsl: map_cat_substr_len sp: sort_pack sr: sort_regex sra: sort_regex_anchored ss: sort_substr Benchmarks: Note: Unordered data extended with 'map "a-$_", shuffle 0..10000' Rate sra sr sp STra STr STren STraen STran STrn ss + mcs mcsl GRTpeq GRTpe sra 3.34/s -- -2% -73% -81% -82% -82% -82% -82% -82% -85% +-92% -92% -93% -95% sr 3.40/s 2% -- -72% -80% -81% -82% -82% -82% -82% -85% +-92% -92% -92% -95% sp 12.3/s 270% 263% -- -28% -32% -34% -34% -35% -35% -44% +-71% -72% -72% -82% STra 17.2/s 416% 406% 39% -- -5% -8% -8% -9% -10% -22% +-59% -60% -61% -75% STr 18.2/s 445% 435% 47% 6% -- -2% -3% -4% -4% -18% +-57% -58% -59% -73% STren 18.6/s 459% 448% 51% 8% 2% -- -0% -1% -2% -16% +-56% -57% -58% -73% STraen 18.7/s 461% 451% 52% 9% 3% 0% -- -1% -2% -16% +-55% -57% -58% -72% STran 18.9/s 467% 456% 53% 10% 4% 1% 1% -- -1% -15% +-55% -56% -58% -72% STrn 19.0/s 471% 460% 54% 11% 5% 2% 2% 1% -- -14% +-55% -56% -57% -72% ss 22.2/s 565% 552% 80% 29% 22% 19% 18% 17% 16% -- +-47% -49% -50% -67% mcs 42.0/s 1159% 1136% 240% 144% 131% 125% 124% 122% 121% 89% + -- -3% -6% -38% mcsl 43.3/s 1199% 1174% 251% 152% 138% 132% 131% 129% 128% 95% + 3% -- -3% -36% GRTpeq 44.7/s 1239% 1214% 262% 160% 146% 140% 138% 136% 135% 101% + 6% 3% -- -34% GRTpe 67.9/s 1937% 1898% 451% 295% 273% 264% 263% 259% 257% 206% + 62% 57% 52% --

      here the code

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        ++ Many thanks for tracking down the problem. Much appreciated.

        The results are now more in line with what I would have expected. I see that Perl's string handling function, substr, outstrips the regex solutions: I have been recommending, for a very long time, that string functions be chosen over regexes (where they provide equivalent functionality).

        I should probably add some ST routines (e.g. STss, STmcs) to see how they fare; for instance, would GRTpe be faster than STss. I'm currently at $work, so I can't do that now; I'll look into it this evening (i.e. ~8-10hrs hence).

        — Ken

        Out of curiosity I added some subs using Sort::Key. sort_key_natural is the natsort function from Sort::Key::Natural while sort_key_integer uses the ikeysort function from Sort::Key in tandem with substr.

        The natsort approach is not particularly fast, but this is perhaps to be expected given it is a general purpose function (as are the unanchored regex approaches). I guess the integer key approach is faster as it takes advantage of direct string operations when building the keys, and then whatever optimisations Sort::Key uses internally.

        I assume the differences in the order of the other approaches compared with Lanx's is due to the code being run on Strawberry perl 5.28. It would be interesting to know how the Sort::Key approaches go under a more recent Perl.

        Edit: And now I look at the source code for Sort::Key::Natural, it is uses a regex approach to divide the string and pad out the numeric sections, so it is not surprising that it is slower than the other regex based approaches here. https://metacpan.org/dist/Sort-Key/source/lib/Sort/Key/Natural.pm#L34.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11134361]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (2)
As of 2024-04-26 03:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found