Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re^2: Optimizing with Caching vs. Parallelizing (MCE::Map) (traps for the unwary)

by vr (Curate)
on Apr 20, 2020 at 10:00 UTC ( #11115825=note: print w/replies, xml ) Need Help??

in reply to Re: Optimizing with Caching vs. Parallelizing (MCE::Map)
in thread Optimizing with Caching vs. Parallelizing (MCE::Map)

Many thanks for providing this wonderful challenge! If all goes at the pace it's rolling now, the fun may well continue into the month of May! Therefore, a warning: some people are at danger of gaining (or loosing) more than what they have bargained for!:)

I decided to ask people to output the top 20, because that presents an interesting mini-challenge by itself

Well said, dear rjt, well said. But it seems it were YOU, who fell into the trap that you so cunningly crafted for poor innocent learners! The top 20 out of million Collatz sequences has an insidious property: there are six "445" lengths in 1e6, but only four in top-20 (so, all six are in top-22). Which to extract? Aha!

Well, the challenge did not clearly state how to order numbers with the same Collatz lengths (CL). But I think it is reasonable to assume, that, since numbers with longer CL are rated better/closer to top, then smaller numbers among producing same CL are to be valued more. Like: "Look! This brave little number commendably creates as long CL as that huge number! And this undeservedly huge number has only managed to generate so puny CL! Loser!"

At least, there must be some consistency in arranging results, don't you agree? In other words, I think that if CL column descends, then numbers column must ascend (for equal CLs). See ordering for CL 450, too.

Well, dear Perl users, I happened to notice this, because in another my (unpublished) solution I had to endure great pain in arranging top-20 properly. Just how many of you, looking at output of (almost all) scripts in this and related 2 threads, have noticed, that top 20 are neatly ordered? And yet, this "natural" result comes at no cost at all, with Perl! Just appreciate what you so ungratefully consume!:)

To illustrate, rjt, here's, in parallel, output of your script and Laurent_R's with marioroy fixes:

Collatz(837799) has sequence length of 525 steps | 837799: 525 Collatz(626331) has sequence length of 509 steps | 626331: 509 Collatz(939497) has sequence length of 507 steps | 939497: 507 Collatz(704623) has sequence length of 504 steps | 704623: 504 Collatz(910107) has sequence length of 476 steps | 910107: 476 Collatz(927003) has sequence length of 476 steps | 927003: 476 Collatz(511935) has sequence length of 470 steps | 511935: 470 Collatz(767903) has sequence length of 468 steps | 767903: 468 Collatz(796095) has sequence length of 468 steps | 796095: 468 Collatz(970599) has sequence length of 458 steps | 970599: 458 Collatz(546681) has sequence length of 452 steps | 546681: 452 Collatz(820022) has sequence length of 450 steps | 818943: 450 Collatz(818943) has sequence length of 450 steps | 820022: 450 Collatz(820023) has sequence length of 450 steps | 820023: 450 Collatz(410011) has sequence length of 449 steps | 410011: 449 Collatz(615017) has sequence length of 447 steps | 615017: 447 Collatz(922526) has sequence length of 445 steps | 886953: 445 Collatz(922526) has sequence length of 445 steps | 906175: 445 Collatz(886953) has sequence length of 445 steps | 922524: 445 Collatz(906175) has sequence length of 445 steps | 922525: 445

But wait... What's that??? The 922526 number is listed twice, on the left?? Is this... can't believe... is this because of infamous What Every Computer Scientist Should Know About Floating-Point Arithmetic? Because someone:) decided to cut corners and resort to FP? Or is it for another reason? Didn't investigate yet. Just who would have thought that this would surface in so innocent task "calculate lengths of Collatz sequences". Wow! Great challenge!

Edit. No, of course it's not floating point issue. Algo is broken. CLs for odd numbers are cached, but for even numbers they are not. Some even numbers are never passed to &top (e.g. 922524), others are pumped into this subroutine several times. The 922526 hadn't been phased out from @top by odd numbers with longer CLs. With $top large enough, there are many even dupes.

  • Comment on Re^2: Optimizing with Caching vs. Parallelizing (MCE::Map) (traps for the unwary)
  • Download Code

Replies are listed 'Best First'.
Re^3: Optimizing with Caching vs. Parallelizing (MCE::Map) (traps for the unwary)
by rjt (Curate) on Apr 20, 2020 at 19:41 UTC

    Ha! Good eye. I didn't even notice the doubled-up 922526. It's not a FP bug. Rather, it's a corner case thanks to how I combined the /2 optimization + memoization; certain even numbers get added to the p.queue twice. It can be fixed trivially by either removing the /2 optimization (simpler, ~5% penalty), or skipping seen numbers in the second call to top() (no measurable penalty, adds a variable).

    As to your interpretation of the "top 20 arrangement," I like your discussion! We try to keep the task descriptions only quasi-formal, to keep the challenge accessible to beginners, which is why you don't usually see these sorts of details specified like a requirements document. Meaning, many "minor" details are left to the discretion of the participants. The upshot of that is, if you submit a really weird interpretation, you'd probably net yourself a mildly amusing comment in my next review, at least. :-)

    use strict; use warnings; omitted for brevity.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11115825]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2021-04-22 21:27 GMT
Find Nodes?
    Voting Booth?

    No recent polls found