http://qs321.pair.com?node_id=1216456


in reply to Re^4: Why should any one use/learn Perl 6?
in thread Why should any one use/learn Perl 6?

"hashes in Cuckoo are not thread safe in the sense that they might lose updates when being updated from multiple threads" Oh, dear. Sounds like a rather significant deficiency compared to Perl, when we are discussing concurrency.

Perl 6 has decided that it is not a good idea to shuffle the inherent problems of updating a data-structure like a hash from multiple threads at the same time, under the carpet. Tieing a hash with a Perl interface to make sure that all updates are done sequentially, is not a good step towards making a fully functional, and well performing threaded program without any bottlenecks. You, as a developer, need to be aware of the issues, and make adaptations to your program and/or the way you think about threaded programming.

Think about writing your solutions as a pipeline, or using react whenever using an event driven model.

In that sense, Perl 5 ithreads makes you a lazy programmer with everything being thread-local by default.

"Note that the Perl 5 solution to updating shared data structures requires tieing and locking." Wrong. It does not.

If I look at the code of MCE::Shared and MCE::Shared::Scalar, I do see things like a sub TIESCALAR, and &MCE::Shared::Scalar::new being bound to said TIESCALAR. That to me implies tieing. Or am I wrong?

I'm negative about your project because it is still squatting on Perl's name, duh.

I'm glad to hear that it's only the name you object to now.

Replies are listed 'Best First'.
Re^6: Why should any one use/learn Perl 6?
by hippo (Bishop) on Jun 12, 2018 at 07:58 UTC
    In that sense, Perl 5 ithreads makes you a lazy programmer

    Of course it does. Perl always encourages the three virtues.

Re^6: Why should any one use/learn Perl 6?
by jeffenstein (Hermit) on Jun 12, 2018 at 17:25 UTC

    Perl 6 has decided that it is not a good idea to shuffle the inherent problems of updating a data-structure like a hash from multiple threads at the same time, under the carpet. Tieing a hash with a Perl interface to make sure that all updates are done sequentially, is not a good step towards making a fully functional, and well performing threaded program without any bottlenecks. You, as a developer, need to be aware of the issues, and make adaptations to your program and/or the way you think about threaded programming.

    Think about writing your solutions as a pipeline, or using react whenever using an event driven model.

    As I understand this, there are two points you are making: shared hashes in Perl 6 are faster but not thread safe, and hashes are a bad choice of data structure for parallel programming.

    In cases where shared data is a better choice (no judgement here), isn't this a step backwards towards pthreads? In Perl 5 you're sure that at least the underlying data structure can't be corrupted (each thread has it's own interpreter) but in Perl 6 you need to take specific steps to ensure that it is not corrupted (threads share the same interpreter). I would think that with Perl 6, even with locking, you would still have better performance here since you aren't copying data between interpreters.

    Surely there is some middle ground between the Perl 5 threading model, pthreads, and the Python/Ruby GIL?

    I'm sure there is some nuance that I'm missing here. You have a lot more experience with threading than me, so maybe I'm just oversimplifying it and missing the point?

    Update: I found this blog post that discusses it. Still digesting it

      Thank you for reminding me about that excellent blog post by Jonathan. It was a bit ranty, but it also was a direct result of similar questions I was asking at that time. :-)

      At the moment, MoarVM has a potential for crashing when more than one thread is adding a key to a hash at the same time. This is a known issue and still being debated on how to be solved.

      So, to work-around the possibility of crashes, one should make sure that the hash already has all of the possible keys before starting to do the parallel run. If rewritten your code to be more idiomatic Perl 6:

      constant RANGE = 10_000; my %hash = ^RANGE Z=> 0 xx RANGE; await do for 1..10 { start { %hash{ (^RANGE).pick }++ for ^100_000; } } say "Seen { %hash.values.grep(* > 0).elems } keys"; # Seen 10000 keys say "Average value (~100 if threadsafe): { %hash.values.sum / %hash.el +ems }"; # Average value (~100 if threadsafe): 91.7143

      I think the constant RANGE = 10_000 is rather self-explanatory. The next line may be somewhat harder to grasp: it fills the hash %hash with a list of Pairs generated by zipping (Z) a range (^RANGE, which is short for 0 .. RANGE - 1) with a sequence of 10_000 zeroes (0 xx RANGE) using the fat-comma operator (=>)

      Then we execute %hash{ ^RANGE .pick }++ for ^100_000 in 10 different threads. The (^RANGE).pick picks a random value from the range of 0 .. RANGE - 1.

      The results are then shown by directly interpolating code inside a string: you can use curly braces for that.

      You can use the .sum method to get a sum of values, and .elems directly on the hash to find out the number of elements

      I haven't been able to get this version to crash: however, it is still not threadsafe for the reasons that Jonathan explains so well in his blog post.

      If one uses the map/reduce idiom, the code would look like this:

      constant RANGE = 10_000; my %hash is Bag = await do for 1..10 { start { my %h; %h{ ^RANGE .pick }++ for ^100_000; %h; } } say "Seen { %hash.values.grep(* > 0).elems } keys"; # Seen 10000 keys say "Average value (~100 if threadsafe): { %hash.values.sum / %hash.el +ems }"; # Average value (~100 if threadsafe): 100

      You will note that now each thread has its own hash that can get updated without any problems. The result is a list of 10 hashes that are merged into a single hash with Bag semantics. (see Sets, Bags and Mixes for more information on Bags).

      A Bag is basically an object hash (so the keys are not necessarily strings) that only accepts positive integers as values. Initialization of a Bag accepts and merges anything that looks like a Pair or a list of Pairs (which is basically what a hash is in that context).

      So that would be the idiom to use safely. Hope this made sense :-)

        At the moment, MoarVM has a potential for crashing ... So, to work-around the possibility of crashes, one should make sure that the hash already has all of the possible keys before starting to do the parallel run. If rewritten your code to be more idiomatic Perl 6

        This is rather amusing if sadly unsurprising. Having been sold Perl6 on the basis that is has all the benefits of using a bytecode VM, we now see that the VM tail is wagging the Perl6 dog.

        Surely the sane advice should be: if a particular VM is unstable then stop using it and use another VM instead. There's no need to rewrite/refactor already working code just because one VM has bugs.

Re^6: Why should any one use/learn Perl 6?
by marioroy (Prior) on Jun 13, 2018 at 11:27 UTC

    Greetings liz, MCE::Shared provides two interfaces OO and TIE. The OO interface does not involve TIE.

      Greetings marioroy.

      Indeed, I stand corrected. However, I think that 1nickt's point about it not doing any locking and solve the issue that Perl 6 has when updating a container from multiple threads at the same time, is not true. From the MCE::Shared documentation:

      # Locking is necessary when multiple workers update the same # element. The reason is that it may involve 2 trips to the # shared-manager process: fetch and store in this case. $mutex->enter( sub { $cnt += 1 } );

      This implies to me that MCE::Shared suffers from exactly the same issues that Perl 6 has and which Jonathan so aptly describes in blog post. Or am I missing something?

        Greetings liz,

        A mutex is necessary whenever the Perl-like behavior, via the TIE interface, involves 2 IPC calls { FETCH and STORE }.

        use strict; use warnings; use feature 'say'; use MCE::Hobo; use MCE::Mutex; use MCE::Shared; my $mutex = MCE::Mutex->new(); tie my $var, 'MCE::Shared', 0; sub task { for ( 1 .. 2000 ) { $mutex->enter(sub { $var += 1; # FETCH, STORE $var += 4; # Ditto }); } } MCE::Hobo->create(\&task) for 1 .. 4; MCE::Hobo->waitall; say $var; # 40000

        A mutex is not necessary via the OO interface. MCE::Shared::{ Array, Hash, and Scalar } include sugar methods.

        use strict; use warnings; use feature 'say'; use MCE::Hobo; use MCE::Shared; my $var = MCE::Shared->scalar(0); sub task { for ( 1 .. 2000 ) { $var->incr; $var->incrby(4); } } MCE::Hobo->create(\&task) for 1 .. 4; MCE::Hobo->waitall; say $var->get; # 40000

        MCE::Shared works with threads and other parallel modules. Below, the same thing using threads.

        use strict; use warnings; use feature 'say'; use threads; use MCE::Shared; my $var = MCE::Shared->scalar(0); sub task { for ( 1 .. 2000 ) { $var->incr; $var->incrby(4); } } threads->create(\&task) for 1 .. 4; $_->join for threads->list; say $var->get; # 40000

        One may also customize the shared class feasibly.

        use strict; use warnings; package My::Scalar; use MCE::Shared::Scalar; use base 'MCE::Shared::Scalar'; sub set_if_max { # my ( $self, $value ) = @_; ${ $_[0] } = $_[1] if ( $_[1] > ${ $_[0] } ); 1; } 1; package main; use feature 'say'; use MCE::Shared; my $var = MCE::Shared->share( { module => 'My::Scalar' }, 0 ); $var->set_if_max(42); $var->set_if_max(11); say $var->get; # 42

        Regards, Mario