Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^6: Why should any one use/learn Perl 6?

by jeffenstein (Hermit)
on Jun 12, 2018 at 17:25 UTC ( [id://1216491]=note: print w/replies, xml ) Need Help??


in reply to Re^5: Why should any one use/learn Perl 6?
in thread Why should any one use/learn Perl 6?

Perl 6 has decided that it is not a good idea to shuffle the inherent problems of updating a data-structure like a hash from multiple threads at the same time, under the carpet. Tieing a hash with a Perl interface to make sure that all updates are done sequentially, is not a good step towards making a fully functional, and well performing threaded program without any bottlenecks. You, as a developer, need to be aware of the issues, and make adaptations to your program and/or the way you think about threaded programming.

Think about writing your solutions as a pipeline, or using react whenever using an event driven model.

As I understand this, there are two points you are making: shared hashes in Perl 6 are faster but not thread safe, and hashes are a bad choice of data structure for parallel programming.

In cases where shared data is a better choice (no judgement here), isn't this a step backwards towards pthreads? In Perl 5 you're sure that at least the underlying data structure can't be corrupted (each thread has it's own interpreter) but in Perl 6 you need to take specific steps to ensure that it is not corrupted (threads share the same interpreter). I would think that with Perl 6, even with locking, you would still have better performance here since you aren't copying data between interpreters.

Surely there is some middle ground between the Perl 5 threading model, pthreads, and the Python/Ruby GIL?

I'm sure there is some nuance that I'm missing here. You have a lot more experience with threading than me, so maybe I'm just oversimplifying it and missing the point?

Update: I found this blog post that discusses it. Still digesting it

Update^2: Ok, I'm more confused now. :(

In the blog post, I think I understand the reasoning: he wants to make subtle logic errors into hard failures:

... What achieving safety at the micro level will most certainly achieve, however, is increasing the time it takes for the programmer to discover the real problems in their program. If anything, we want such inevitably unreliable programs to reliably fail, not reliably pretend to work.

Ok, fair enough. I don't necessarily agree with it, but I'm certainly not on Jonathan's level, so my opinion doesn't count for much here. So, zeroing in on the "...unreliable programs to reliably fail...", I wrote this very bad snippet that does stuff very wrong:

#!/usr/bin/env perl6 # my %shared_hash; await do for 1..10 -> $thread { start { for 1..100_000 -> $loop { my $key = floor(rand * 10_000); my $ref := %shared_hash<$key>; if !$ref { $ref = 0; } %shared_hash{$key} = $ref + 1; } } } say "shared_hash has ", %shared_hash.keys.elems, " keys"; # Don't know how to do this easily. Is there an average function some +where? my $sum = 0; for %shared_hash.values -> $v { $sum += $v; } my $ave = $sum / %shared_hash.keys.elems; say "Average value (~10 if threadsafe): ", $ave;

As expected, I get the wrong results from the hash contents:

$ perl6 killit.p6 shared_hash has 10001 keys Average value (~10 if threadsafe): 0.999900

And about every 5th run through, this not so helpful error message:

$ perl6 killit.p6 *** Error in `/home/jeff/perl6/bin/moar': double free or corruption (! +prev): 0x00002ad5b0147070 *** ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x2ad5a8b3c7e5] /lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x2ad5a8b4537a] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x2ad5a8b4953c] //home/jeff/perl6/lib/libmoar.so(+0x2266af)[0x2ad5a838b6af] [0x2ad5aa6f1ce8] ======= Memory map: ======== 00400000-00402000 r-xp 00000000 fc:09 888145 + /home/jeff/perl6/bin/moar 00602000-00603000 r--p 00002000 fc:09 888145 + /home/jeff/perl6/bin/moar 00603000-00604000 rw-p 00003000 fc:09 888145 + /home/jeff/perl6/bin/moar 00a4f000-039f1000 rw-p 00000000 00:00 0 + [heap] ...

I'm completely out of my depth here, since I've never really written threaded code, but to me this seems to not be accomplishing the stated goal. It's not preventing the logic error by failing reliably, and the error when it does fail is completely ambiguous. I would probably have guessed that it's a bug in moar rather than a bug in my code.

I'm still going to try to learn the language, but I don't think it will ever become my primary language. Most of the code I write is glue anyway, so I don't think I'm the target audience

Replies are listed 'Best First'.
Re^7: Why should any one use/learn Perl 6?
by liz (Monsignor) on Jun 12, 2018 at 21:54 UTC

    Thank you for reminding me about that excellent blog post by Jonathan. It was a bit ranty, but it also was a direct result of similar questions I was asking at that time. :-)

    At the moment, MoarVM has a potential for crashing when more than one thread is adding a key to a hash at the same time. This is a known issue and still being debated on how to be solved.

    So, to work-around the possibility of crashes, one should make sure that the hash already has all of the possible keys before starting to do the parallel run. If rewritten your code to be more idiomatic Perl 6:

    constant RANGE = 10_000; my %hash = ^RANGE Z=> 0 xx RANGE; await do for 1..10 { start { %hash{ (^RANGE).pick }++ for ^100_000; } } say "Seen { %hash.values.grep(* > 0).elems } keys"; # Seen 10000 keys say "Average value (~100 if threadsafe): { %hash.values.sum / %hash.el +ems }"; # Average value (~100 if threadsafe): 91.7143

    I think the constant RANGE = 10_000 is rather self-explanatory. The next line may be somewhat harder to grasp: it fills the hash %hash with a list of Pairs generated by zipping (Z) a range (^RANGE, which is short for 0 .. RANGE - 1) with a sequence of 10_000 zeroes (0 xx RANGE) using the fat-comma operator (=>)

    Then we execute %hash{ ^RANGE .pick }++ for ^100_000 in 10 different threads. The (^RANGE).pick picks a random value from the range of 0 .. RANGE - 1.

    The results are then shown by directly interpolating code inside a string: you can use curly braces for that.

    You can use the .sum method to get a sum of values, and .elems directly on the hash to find out the number of elements

    I haven't been able to get this version to crash: however, it is still not threadsafe for the reasons that Jonathan explains so well in his blog post.

    If one uses the map/reduce idiom, the code would look like this:

    constant RANGE = 10_000; my %hash is Bag = await do for 1..10 { start { my %h; %h{ ^RANGE .pick }++ for ^100_000; %h; } } say "Seen { %hash.values.grep(* > 0).elems } keys"; # Seen 10000 keys say "Average value (~100 if threadsafe): { %hash.values.sum / %hash.el +ems }"; # Average value (~100 if threadsafe): 100

    You will note that now each thread has its own hash that can get updated without any problems. The result is a list of 10 hashes that are merged into a single hash with Bag semantics. (see Sets, Bags and Mixes for more information on Bags).

    A Bag is basically an object hash (so the keys are not necessarily strings) that only accepts positive integers as values. Initialization of a Bag accepts and merges anything that looks like a Pair or a list of Pairs (which is basically what a hash is in that context).

    So that would be the idiom to use safely. Hope this made sense :-)

      At the moment, MoarVM has a potential for crashing ... So, to work-around the possibility of crashes, one should make sure that the hash already has all of the possible keys before starting to do the parallel run. If rewritten your code to be more idiomatic Perl 6

      This is rather amusing if sadly unsurprising. Having been sold Perl6 on the basis that is has all the benefits of using a bytecode VM, we now see that the VM tail is wagging the Perl6 dog.

      Surely the sane advice should be: if a particular VM is unstable then stop using it and use another VM instead. There's no need to rewrite/refactor already working code just because one VM has bugs.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1216491]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2024-04-19 19:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found