Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^2: RFC: 100 PDL Exercises (ported from numpy)

by vr (Curate)
on May 08, 2018 at 16:58 UTC ( [id://1214227]=note: print w/replies, xml ) Need Help??


in reply to Re: RFC: 100 PDL Exercises (ported from numpy)
in thread RFC: 100 PDL Exercises (ported from numpy)

Hi, bliako, thank you so much for detailed answer, my statistics skills were (hopefully) auto-vivified :). After following your links and code in earnest, I felt brave enough to make some experiments and write a comment, but in the process I discovered something strange ;).

----------

First, my impression is that solutions to exercices were supposed to be simple (as, KISS). So, perhaps to translate, almost verbatim, Python solution to PDL, answer to #100 can be:

use strict; use warnings; use feature 'say'; use PDL; my $n = 100; # input sample size my $m = 1000; # number of bootstrap repeats my $r = $n; # re-sample size my $x = random $n; my $idx = random $r, $m; $idx *= $n; say $x-> index( $idx ) -> avgover -> pctover( pdl 0.05, 0.95 ); __END__ [ 0.4608755 0.55562806]

Interesting, here, PDL DWIMs for me -- no need to floor an index to thread over a piddle (just as with Perl's array indices). I also stand corrected in "floor converts to Long in-place" -- it rounds in-place, but piddle stays Double.

This 'never to explicitly loop in vectorized language' answer, unfortunately, hides the ugly truth that for very large data we can end with huge R x M matrices of random indices and equally huge (equally unnecessary) matrices of all re-samplings, and thus die because of 'Out of memory!'.

I was experimenting with this or that (PDL's automatic parallelization, in particular), which I'm skipping now, because next is something weird.

Consider this version of the above, which avoids 2-dimensional index matrix and results of re-samplings, but is still un-parallel:

use strict; use warnings; use feature 'say'; use Time::HiRes 'time'; use PDL; srand( 123 ); my $time = time; my $n = 30000; # input sample size my $m = 10000; # number of bootstrap repeats my $r = $n; # re-sample size my $x = random $n; my $avg = zeroes $m; for ( 0 .. $m - 1 ) { my $idx = random $r; $idx *= $n; $avg-> set( $_, $x-> index( $idx )-> avg ) } say $avg-> pctover( pdl 0.05, 0.95 ); say time - $time; __END__ [0.49384165 0.49941814] 6.11959099769592

Next is solution where I'm starting to try to parallelize, but because of selected parameters (single thread) I'm not only expecting no gain, but due to overhead it must be slower. And yet:

use strict; use warnings; use feature 'say'; use Time::HiRes 'time'; use PDL; use PDL::Parallel::threads qw/ share_pdls retrieve_pdls /; srand( 123 ); my $time = time; my $n = 30000; # input sample size my $m = 10000; # number of bootstrap repeats my $r = $n; # re-sample size my $x = random $n; my $avg = zeroes $m; share_pdls x => $x, avg => $avg; threads-> create( sub { my ( $x, $avg ) = retrieve_pdls qw/ x avg /; for ( 0 .. $m - 1 ) { my $idx = random $r; $idx *= $n; $avg-> set( $_, $x-> index( $idx )-> avg ) } }); $_-> join for threads-> list; say $avg-> pctover( pdl 0.05, 0.95 ); say time - $time; __END__ [0.49384165 0.49941814] 4.57857203483582

Why is that? :) I tried to insert

use PDL::Parallel::threads qw/ share_pdls retrieve_pdls /; share_pdls x => $x, avg => $avg; ( $x, $avg ) = retrieve_pdls qw/ x avg /;

into no-threads solution (does retrieve_pdls set any flags that speed things up? Nope.)

$ perl -v This is perl 5, version 26, subversion 1 (v5.26.1) built for x86_64-li +nux-thread-multi (with 1 registered patch, see perl -V for more detail) $ perl -MPDL -E 'say $PDL::VERSION' 2.019

Replies are listed 'Best First'.
Re^3: RFC: 100 PDL Exercises (ported from numpy)
by bliako (Monsignor) on May 12, 2018 at 13:57 UTC

    vr your code is superior than the long code I have posted!

    If I may add: using oddpctover() might be preferred because it does not interpolate when there is no data at the exact percentile position.

    Regarding the time difference when running with and without "use threads", I have discovered that avg() is the culprit. If you use x-> index( $idx )->at(0) rather than x-> index( $idx )->avg the performance is the same (which means idx() is also excluded as possible cause).

      Regarding the time difference when running with and without "use threads", I have discovered that avg() is the culprit.

      Hm-m, it's not what I'm observing here. Setting $m = 30_000 and replacing avg with at(0), these are results of 3 runs without thread and in a thread:

      14.9562268257141 14.9582891464233 14.8853561878204 11.9686307907104 12.0527169704437 12.0850310325623

      And then replacing all 3 lines of loop block with just simple

      my $y = $x-> index( sequence $n )-> at( 0 );

      7.74871516227722 7.80155396461487 7.71721601486206 4.92977499961853 4.87044596672058 4.87968802452087

      So I'd say it's something strange with index going on. I'm not appealing to anyone for investigation :), observable speed difference may depend very much on hardware, let it be a murky PDL mystery.

      And another entertaining (but dangerous, maybe) bit of PDL trivia: above I sketched a parallelization solution with threads and random. Except, if there were more than 1 worker thread, it won't work as expected. The random documentation says Perl's srand can be used to seed, and one may assume that random relies on Perl's RNG, including it (automatically) seeds in each thread that's started. Consider:

      use strict; use warnings; use feature 'say'; use threads; use PDL; PDL::no_clone_skip_warning; srand; async sub{ say random( 1 )-> at( 0 )} for 1 .. 5; $_-> join for threads-> list; say rand; __END__ 0.851411183904023 0.851411183904023 0.851411183904023 0.851411183904023 0.851411183904023 0.851411183904023

      That's some randomness. Try to say rand instead. So, one must to explicitly call srand at the start of a thread, if using threads and PDL's random.

Re^3: RFC: 100 PDL Exercises (ported from numpy)
by marioroy (Prior) on Sep 03, 2019 at 05:15 UTC

    Hi, vr

    Tonight came across your post and modified your demonstration to run with 4 threads.

    # https://www.perlmonks.org/?node_id=1214227 use strict; use warnings; use feature 'say'; use PDL; use PDL::Parallel::threads qw(retrieve_pdls); use threads; use MCE::Shared; use Time::HiRes 'time'; srand( 123 ); my $time = time; my $n = 30000; # input sample size my $m = 10000; # number of bootstrap repeats my $r = $n; # re-sample size my $x = random( $n ); $x->share_as('x'); my $avg = zeroes( $m ); $avg->share_as('avg'); my $seq = MCE::Shared->sequence( 0, $m - 1 ); sub parallel_task { srand; my ( $x, $avg ) = retrieve_pdls('x', 'avg'); while ( defined ( my $seq_n = $seq->next() ) ) { my $idx = random $r; $idx *= $n; $avg->set( $seq_n, $x->index( $idx )->avg ); } } threads->create( \&parallel_task ) for 1 .. 4; # ... do other stuff ... $_->join() for threads->list(); say $avg->pctover( pdl 0.05, 0.95 ); say time - $time, ' seconds'; __END__ # Output [0.49395242 0.49936752] 1.28744792938232 seconds

    Afterwards, re-validated PDL with MCE and released 1.847. The effort is mainly for folks running Perl lacking threads support. Here it is, PDL and MCE::Shared running similarly.

    # https://www.perlmonks.org/?node_id=1214227 use strict; use warnings; use feature 'say'; use PDL; # must load PDL before MCE::Shared use MCE::Hobo; use MCE::Shared 1.847; use Time::HiRes 'time'; srand( 123 ); my $time = time; my $n = 30000; # input sample size my $m = 10000; # number of bootstrap repeats my $r = $n; # re-sample size # On Windows, the non-shared piddle ($x) is unblessed in threads. # Therefore, constructing the piddle inside the worker. # UNIX platforms benefit from copy-on-write. Thus, one copy. my $x = ( $^O eq 'MSWin32' ) ? undef : random( $n ); my $avg = MCE::Shared->pdl_zeroes( $m ); my $seq = MCE::Shared->sequence( 0, $m - 1 ); sub parallel_task { $x = random( $n ) unless ( defined $x ); while ( defined ( my $seq_n = $seq->next() ) ) { my $idx = random $r; $idx *= $n; # $avg is a shared piddle which resides inside the shared- # manager process or thread. The piddle is accessible via the # OO interface only. $avg->set( $seq_n, $x->index( $idx )->avg ); } } MCE::Hobo->create( \&parallel_task ) for 1 .. 4; # ... do other stuff ... MCE::Hobo->wait_all(); # MCE sets the seed of the base generator uniquely between workers. # Unfortunately, it requires running with one worker for predictable # results (i.e. no guarantee in the order which worker computes the # next input chunk). say $avg->pctover( pdl 0.05, 0.95 ); say time - $time, ' seconds'; __END__ # Output [0.49387191 0.49937053] 1.29038286209106 seconds

    Regards, Mario

      Here is the same thing using MCE. Workers obtain the next sequence number without involving the manager process. Thus, the reason why it runs faster. I had to think about it when I saw the run time.

      # https://www.perlmonks.org/?node_id=1214227 use strict; use warnings; use feature 'say'; use PDL; # must load PDL before MCE::Shared use MCE 1.847; use MCE::Shared 1.847; use Time::HiRes 'time'; srand( 123 ); my $time = time; my $n = 30000; # input sample size my $m = 10000; # number of bootstrap repeats my $r = $n; # re-sample size # On Windows, the non-shared piddle ($x) is unblessed in threads. # Therefore, constructing the piddle inside the worker. # UNIX platforms benefit from copy-on-write. Thus, one copy. my $x = ( $^O eq 'MSWin32' ) ? undef : random( $n ); my $avg = MCE::Shared->pdl_zeroes( $m ); MCE->new( max_workers => 4, sequence => [ 0, $m - 1 ], chunk_size => 1, user_begin => sub { $x = random( $n ) unless ( defined $x ); }, user_func => sub { my $idx = random $r; $idx *= $n; # $avg is a shared piddle which resides inside the shared- # manager process or thread. The piddle is accessible via the # OO interface only. $avg->set( $_, $x->index( $idx )->avg ); } )->run; # MCE sets the seed of the base generator uniquely between workers. # Unfortunately, it requires running with one worker for predictable # results (i.e. no guarantee in the order which worker computes the # next input chunk). say $avg->pctover( pdl 0.05, 0.95 ); say time - $time, ' seconds'; __END__ # Output [0.49387106 0.4993768] 1.09556317329407 seconds

      Thank you, vr. I had no idea that PDL random is not unique between threads. MCE already sets the seed of the base generator, but did not do so for workers spawned as threads. This is resolved in MCE 1.847.

      Regards, Mario

        A reply from the future! Since 2.062, PDL's random numbers (using the random function) can be generated using pthreads since the new srand creates one seed per CPU.
Re^3: RFC: 100 PDL Exercises (ported from numpy)
by etj (Deacon) on May 01, 2022 at 16:50 UTC
    This 'never to explicitly loop in vectorized language' answer, unfortunately, hides the ugly truth that for very large data we can end with huge R x M matrices of random indices and equally huge (equally unnecessary) matrices of all re-samplings, and thus die because of 'Out of memory!'.
    This can be a real issue! Keeping such index-sets around for exactly the right amount of time will help, which in turn might be assisted by making little functions that do individual operations on subsets (which a captured index-set ndarray which goes out of scope on finishing). Another thing that will help is the forthcoming loop fusion, discussed at https://github.com/PDLPorters/pdl/issues/349: non-slice operations will become lazy, and on evaluation will potentially get put together into a new, loop-fused operation.

    Something else that would help here, as also discussed on #349, is more generalised first-class "index operations". Ideas and contributions here, on the GitHub issue, on the PDL mailing lists, or any other means are most welcome!

    One other thought is that, for larger ndarrays (because POSIX threads have a startup cost), the use of vectorised operations is the way to harness multiple cores for free (for operations that support this), which a Perl for-loop cannot achieve.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1214227]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2024-04-18 22:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found