Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Hi vr,

Lots of PDL goodness. Wow! I do not know how this will perform unless taking a moment or two and give it a try. So here is a parallel version based on your 2nd example.

use strict; use warnings; use feature 'say'; use PDL; use MCE::Flow; use MCE::Candy; use Time::HiRes 'time'; my $t = time; # PDL Quick Reference # https://www.perlmonks.org/?node_id=1214437 sub collatz_seq { my ( $chunk_id, $seq_beg, $seq_end ) = @_; my $max = $seq_end - $seq_beg + 2; my $top = $max < 20 ? $max : 20; my $seqs = pdl( longlong, 1, $seq_beg..$seq_end ); $seqs-> setbadat( 0 ); $seqs-> badvalue( 2 ); my $lengths = 1 + ones( short, $max ); $lengths-> set( 0, 1 ); while ( any my $good_mask = $seqs-> isgood ) { my ( $seqs_odd, $lengths_odd_masked ) = where( $seqs, $lengths, $seqs & 1 ); $lengths_odd_masked ++; $lengths-> where( $good_mask ) ++; ( $seqs_odd *= 3 ) ++; $seqs >>= 1; } my $sorted_i = $lengths-> qsorti; my $sorted = $lengths-> index( $sorted_i ); my $value = $sorted-> at( $max - $top ); my $pos = vsearch_insert_leftmost( $value, $sorted ); my $top_i = $sorted_i-> slice([ $max - 1 , $pos ]); ( my $result = $lengths -> index( $top_i ) -> longlong -> bitnot -> cat( $top_i + 1 ) -> transpose -> qsortvec -> slice([], [ 0, $top - 1 ]) )-> slice([ 0 ], []) -> inplace -> bitnot; # From PDL to Perl: [ 0 1 ] becomes [ 1, 0 ], my $str = $result->string; $str =~ s/(\d+)\s+(\d+)(.*)/$2,$1$3,/g; my $ret = eval $str; $_->[0] = $_->[0] + $seq_beg - 2 for @$ret; MCE->gather( $chunk_id, @$ret ); } my $size = shift || 1e6; $size = 1e6 if $size < 1e6; # minimum $size = 1e9 if $size > 1e9; # maximum my $chunk_size = $size >> 5; my @seqs; mce_flow_s { max_workers => MCE::Util::get_ncpu(), chunk_size => $chunk_size > 100000 ? 100000 : $chunk_size, bounds_only => 1, gather => MCE::Candy::out_iter_array(\@seqs), }, sub { my ( $mce, $chunk_ref, $chunk_id ) = @_; collatz_seq( $chunk_id, @{ $chunk_ref } ); }, 2, $size; MCE::Flow->finish; @seqs = ( sort { $b->[1] <=> $a->[1]} @seqs )[ 0..19 ]; printf "Collatz(%5d) has sequence length of %3d steps\n", @$_ for @seqs; say {*STDERR} time - $t;

Results:

I was expecting for mce_pdl2 using 1 core to be closer to vr_pdl2 than vr_pdl3. Maybe benefitting from CPU L2/L3 cache. Chunking seems to be the reason why mce_pdl2 (non-caching) ran nearly as fast as vr_pdl3 (caching). Below, includes 1 core testing for 1e7 with various chunk sizes.

1e7 testing vr_pdl3: 46.328s cache vr_pdl2: 1m16.058s non-cache mce_pdl2: 1m15.583s chunk_size => $size mce_pdl2: 1m03.709s chunk_size => $size >> 1 mce_pdl2: 52.352s chunk_size => $size >> 2 mce_pdl2: 51.323s chunk_size => $size >> 3 mce_pdl2: 49.396s chunk_size => $size >> 4 mce_pdl2: 48.195s chunk_size => $size >> 5 mce_pdl2: 48.369s chunk_size => $size >> 6 mce_pdl2: 48.501s chunk_size => $size >> 7 chunk_size => 300000 mce_pdl2: 48.195s 1 core mce_pdl2: 25.311s 2 cores mce_pdl2: 14.085s 4 cores mce_pdl2: 7.650s 8 cores mce_pdl2: 4.517s 16 cores mce_pdl2: 3.721s 32 cores chunk_size => 100000 mce_pdl2: 48.395s 1 core mce_pdl2: 25.402s 2 cores mce_pdl2: 13.163s 4 cores mce_pdl2: 6.860s 8 cores mce_pdl2: 3.850s 16 cores mce_pdl2: 2.347s 32 cores Output Collatz(8400511) has sequence length of 686 steps Collatz(8865705) has sequence length of 668 steps Collatz(6649279) has sequence length of 665 steps Collatz(9973919) has sequence length of 663 steps Collatz(6674175) has sequence length of 621 steps Collatz(7332399) has sequence length of 616 steps Collatz(7532665) has sequence length of 616 steps Collatz(5649499) has sequence length of 613 steps Collatz(8474249) has sequence length of 611 steps Collatz(6355687) has sequence length of 608 steps Collatz(8847225) has sequence length of 606 steps Collatz(9533531) has sequence length of 606 steps Collatz(6635419) has sequence length of 603 steps Collatz(9953129) has sequence length of 601 steps Collatz(7464846) has sequence length of 598 steps Collatz(7464847) has sequence length of 598 steps Collatz(3732423) has sequence length of 597 steps Collatz(5598635) has sequence length of 595 steps Collatz(8397953) has sequence length of 593 steps Collatz(6298465) has sequence length of 590 steps
1e8 testing vr_pdl3: 9m44.667s cache vr_pdl2: 16m12.467s non-cache chunk_size => 300000 mce_pdl2: 9m06.078s 1 core mce_pdl2: 4m43.529s 2 cores mce_pdl2: 2m33.136s 4 cores mce_pdl2: 1m21.434s 8 cores mce_pdl2: 45.266s 16 cores mce_pdl2: 36.925s 32 cores chunk_size => 100000 mce_pdl2: 9m09.950s 1 core mce_pdl2: 4m39.677s 2 cores mce_pdl2: 2m24.230s 4 cores mce_pdl2: 1m13.353s 8 cores mce_pdl2: 37.923s 16 cores mce_pdl2: 20.099s 32 cores Output Collatz(63728127) has sequence length of 950 steps Collatz(95592191) has sequence length of 948 steps Collatz(96883183) has sequence length of 811 steps Collatz(86010015) has sequence length of 798 steps Collatz(98110761) has sequence length of 749 steps Collatz(73583070) has sequence length of 746 steps Collatz(73583071) has sequence length of 746 steps Collatz(36791535) has sequence length of 745 steps Collatz(55187303) has sequence length of 743 steps Collatz(56924955) has sequence length of 743 steps Collatz(82780955) has sequence length of 741 steps Collatz(85387433) has sequence length of 741 steps Collatz(63101607) has sequence length of 738 steps Collatz(64040575) has sequence length of 738 steps Collatz(93128574) has sequence length of 736 steps Collatz(93128575) has sequence length of 736 steps Collatz(94652411) has sequence length of 736 steps Collatz(96060863) has sequence length of 736 steps Collatz(46564287) has sequence length of 735 steps Collatz(69846431) has sequence length of 733 steps

Specifying 50000 for chunk size may run faster on 4/6/8 core machines.

8 cores mce_pdl2 1e8 16.660s chunk_size => 300000 mce_pdl2 1e8 10.471s chunk_size => 100000 mce_pdl2 1e8 9.773s chunk_size => 50000

Regards, Mario


In reply to Re^2: Optimizing with Caching vs. Parallelizing (MCE::Map) (PDL fun) by marioroy
in thread Optimizing with Caching vs. Parallelizing (MCE::Map) by 1nickt

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others studying the Monastery: (4)
    As of 2020-10-22 06:56 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      My favourite web site is:












      Results (225 votes). Check out past polls.

      Notices?