Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Sorry for delay with reply/update, first I got distracted with "let's try another optimization", then with other/unrelated things. I hope this topic is still warm :).

My "caching" PDL solution (I'd prefer "looking-up" to "caching" or "memoization" in this case) is slow because, if vector is processed as a whole, cache hits happen less frequently i.e. after quite a few more useless steps. To illustrate with sequence from original challenge:

23 → 70 → 35 → 106 → 53 → 160 → 80 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1

Number 70 reaches 35 in only one step but 35 itself is yet very far from completion. Simplifying, if 1..70 range was divided into e.g. 2 chunks processed independently one after another, then it would alleviate the problem.

This "chunking" is very easy to add, just wrap the main loop into one external loop (labelled "CHUNKS"), which updates chunk boundaries and sets "views" into piddles, while internal loop (labelled "ITERATIONS") does exactly the same as in previous version. The $current piddle can now be of chunk size. The very first chunk includes dummy 0th element (size is CHUNK + 1), the final one can theoretically be as short as 1 single element.

For 1e7 numbers, the computation time drops by factor of 3, from ~65 to ~21 seconds, and doesn't change much (variations stay at "noise" level) for chunk sizes ~2e4 .. ~2e5.

Next, the idea was, that as soon as chunk is completed, e.g. chunk 11..20 with MAX = 100 and CHUNK = 10, it should be extremely cheap/easy to immediately get Collatz $lengths for even numbers in range 22..40, also marking these $seqs elements as BAD/completed. Though it would be almost as easily achieved "in due time", we don't need to consider/apply any masks at this early/immediate stage. Further, then it follows, we can simply set, in bulk and at the very beginning, each even $seqs element outside 1st chunk as BAD, even if CLs are unknown yet.

2 fragments marked with double '##' do exactly what previous paragraph says. They can be freely disabled for experiments; though the speed-up is very modest, just ~0.5s.

Further "ideas"/considerations didn't result in palpable improvements (nor were pursued systematically).

If it's cheap to shift-left indexes and increment CLs for each chunk as we go, what if we also populate and keep sparse (even only) CL LUT in range MAX..2MAX? Well, this range starts being populated relatively late, and if there was benefit of occasional additional cache hits, it seems to be cancelled out. No gain, no loss. The MAXLEN was introduced to check this.

I tried to add masks/slices/bads for $current, so that e.g. the line $current ++; applies to valid subset only, especially since now all elements at even positions are dead weight from the very start, in chunks 2+. Well, unconditional "en masse" cheap operation appears to be faster, regardless of uselessly tackling "dead weight".

I also considered variable chunk sizes (such as "begin with very short", etc.), but gave up.

use strict; use warnings; use feature 'say'; use PDL; use List::Util; BEGIN { *_min = \&List::Util::min; # collision *_max = \&List::Util::max } # with PDL use constant MAX => 1e7; use constant TOP => _min( 20, MAX ); use constant CHUNK => _min( 8e4, MAX ); # but keep it even use constant MAXLEN => MAX * 1; # ?? # x(1..2) use Time::HiRes 'time'; my $t = time; my $seqs = sequence( longlong, 1 + MAX ); $seqs-> setbadat( 0 ); $seqs-> setbadat( 1 ); $seqs-> badvalue( 2 ); $seqs-> slice([ CHUNK + 2, MAX, 2]) .= 2 ## if CHUNK + 2 <= MAX; ## my $lengths = ones( short, 1 + MAXLEN ); $lengths-> inplace-> setvaltobad( 1 ); $lengths-> set( 1, 1 ); $lengths-> set( 2, 2 ); $lengths-> set( 4, 3 ); CHUNKS: for ( my $from = my $to = 0; $to != MAX; $from = $to + 1 ) { $to = _min( $from + CHUNK, MAX ); # "_c" is for "chunk" my $seqs_c = $seqs-> slice([ $from, $to ]); my $lengths_c = $lengths-> slice([ $from, $to ]); my $current = zeroes( short, nelem( $seqs_c )); ITERATIONS: while ( any $seqs_c-> isgood ) { my ( $seqs_c_odd, $current_odd_masked ) = where( $seqs_c, $current, $seqs_c & 1 ); $current_odd_masked ++; $current ++; ( $seqs_c_odd *= 3 ) ++; $seqs_c >>= 1; my ( $seqs_cap, $lengths_cap, $current_cap ) = where( $seqs_c, $lengths_c, $current, $seqs_c <= MAXLEN ); my $lut = $lengths-> index( $seqs_cap ); # "_f" is for "finished" my ( $seqs_f, $lengths_f, $lut_f, $current_f ) = where( $seqs_cap, $lengths_cap, $lut, $current_cap, $lut-> isgood ); $lengths_f .= $lut_f + $current_f; $seqs_f .= 2; # i.e. BAD } # "_e" is for "at even positions, ahead" ## ## # my $from_e = _max( $from * 2, $to ) + 2; # bug ## my $from_e = $from == 0 ? $to + 2 : $from * 2; # fixed ## my $to_e = _min( $to * 2, MAXLEN ); ## ## ( $lengths-> slice([ $from_e, $to_e, 2 ]) ## .= $lengths-> slice([ $from_e / 2, $to_e / 2 ])) ++ ## if $from_e <= MAXLEN ## } # same finale $lengths-> badflag( 0 ); $lengths = $lengths-> slice([ 1, MAX ]); my $sorted_i = $lengths-> qsorti; my $sorted = $lengths-> index( $sorted_i ); my $value = $sorted-> at( MAX - TOP ); my $pos = vsearch_insert_leftmost( $value, $sorted ); my $top_i = $sorted_i-> slice([ MAX - 1, $pos ]); ( my $result = $lengths -> index( $top_i ) -> longlong -> bitnot -> cat( $top_i + 1 ) -> transpose -> qsortvec -> slice([], [ 0, TOP - 1 ]) )-> slice([ 0 ], []) -> inplace -> bitnot; say $result; say time - $t; __END__

Edit (bug fix). :(( With "dummy 0th element prepended", my intention for chunks was e.g. "0-10,11-20,21-30, ..., i.e. 1st is one longer. Then I fooled with "better presentation", from "infinite" while loop, to while loop with explicit condition, to for loop, with slightly different formulas for from,to,from_e,to_e, and messed up. Some even lengths aren't calculated, as seen with MAX and CHUNK e.g. 10 and 4. Easiest fix now is to leave chunks all equal (CHUNK+1); one LOC fixed (see "fixed"), above. Sorry.


In reply to Re^3: Optimizing with Caching vs. Parallelizing (MCE::Map) (PDL: faster) by vr
in thread Optimizing with Caching vs. Parallelizing (MCE::Map) by 1nickt

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others chanting in the Monastery: (3)
    As of 2020-10-22 07:39 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      My favourite web site is:












      Results (225 votes). Check out past polls.

      Notices?