Re: Split range 0 to M into N non-overlapping (roughly equal) ranges.

Hi BrowserUk,

I agree that there really isn't a clean way to do this. I'm not really sure what your primary motivation is though, whether it's efficiency or readability.

One way to reform the problem though is to say that you don't want the range [0,M] inclusive, but actually [0,M+1) without including the end point. This way you can avoid having to treat the last or first segment as special case.

Here is how I would redo your code by creating an intermediate data structure to contain the points

#! perl -slw
use strict;

our $N //= 2;
our $M //= 1e6;

$M++; # No segment contains the upper end point.

my @points = map {int $M * $_ / $N} (0..$N);
my @ranges = map {[$points[$_], $points[$_+1]-1]} (0..$#points-1);

printf "%2d : from %7d to %7d (%7d)\n",
    $_, @{ $ranges[ $_ ] },
    $ranges[ $_ ][ 1 ] - $ranges[ $_ ][ 0 ] + 1
    for 0 .. $#ranges;

__END__
[download]

- Miller

Comment on Re: Split range 0 to M into N non-overlapping (roughly equal) ranges. Download Code

Replies are listed 'Best First'.

Re^2: Split range 0 to M into N non-overlapping (roughly equal) ranges.
by BrowserUk (Patriarch) on Mar 13, 2011 at 18:46 UTC

I'm not really sure what your primary motivation is though, whether it's efficiency or readability.

As I said elsewhere, efficiency isn't too much of a consideration for the intended purpose, though given the size of the numbers typically involved it's obviously best to avoid full-iteration schemes:

@r= (); $M = 1e6; $N = 4; 
++$r[ $M % $N ] while $M--; 
print "@r";;
250000 250000 250000 250000
[download]

I've used the code I posted in the OP several times, and it works, in as much as it is good enough for the purpose of distributing M items of work across N threads relatively evenly. Within the constraints of the problem whereby it isn't worth doing for less than M > several 10s or 100s of thousands; and N will be in the 10s or low 100s for the foreseeable future. Small variability within the size of the ranges is insignificant.

But each time I have the need to do this anew, I encounter a pregnant pause in my coding. The solution will never 'flow from my fingers' And when I go off and look up how I did it last time--ie.look at the code in the OP--it takes me several minutes of staring at the code I know worked last time, and even a few quick trial runs, before I'm convinced that I understand how it works.

I am what I would call an 'instinctual' programmer; and each time, my instincts tell me the OP code is not right.

So, rather than the only alternatively (to efficiency) being "readability", I'd say my goal was 'understandability'. Not so much of the individual terms and constructs of the code--anything I've forgotten there is easy to look up, try, and and refresh my mind. It is the understandability of the algorithm that I favour.

All too often I see people promoting the idea of spreading code over 5x or 10x as many lines of code as I typically use for the same purpose, in the name of readability. Whilst this may make the individual steps of the implementation instantly recognisable to even the most casual Perler; I feel (strongly) that it has the detrimental affect of concealing the algorithm in the detail of the implementation. Your typical 'can't see the wood for the trees' scenario.

Just as an important idea can rarely be said better with 100 words when the right 10 will do; so an algorithm is rarely expressed better by using more terms than are required. The art is in finding the right 10 words. I'm not very good at that in English, but have my moments in code.

With regard to your implementation. The M+1 idea is quite interesting and your implementation more so. But, I would argue that you are still making a special case of the last element, in as much as running the second loop to $#points - 1 is effectively doing a pop on the results of the first loop. And that makes it necessary to code it as two loops rather than one; and requires the intermediate results array and prevents the chaining of the maps.

I'm now very convinced that javafan/mr_mischief' insight to defer the int of the step value is the right approach, even with the requirement for a FP fudge factor--something I've got used to using in the past for other algorithms. My instinct is that the next time I go to look at the implementation weeks or months from now, I'll understand what is going on there straight away. Indeed, I think it'll probably 'flow from my fingers' next time, even if that is months from now.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

[reply]
[d/l]
[select]


Do you know where your variables are?
	PerlMonks