http://qs321.pair.com?node_id=882283

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

My brain's not working this morning.

I want to integer divide N by M such that there is the minimal difference in the size of the M integer partitions.

Eg. 10 / 3 => 4, 3, 3 not 4,4,2.

I'm sure I've done this concisely in the past, but I'm damned if I remember how or when.

My best attempt so far is:

sub part{ my( $n, $m ) = @_; my @parts = (int( $n / $m )) x $m; $n -= $_ for @parts; my $i =0; $parts[ $i++ ]++ while $n--; return @parts; }

But it feels messy to me?

Update: And ikegami's purely numerical method is hands down winner:

Rate syp buk eli ike syp 8.36/s -- -42% -47% -74% buk 14.4/s 73% -- -8% -56% eli 15.8/s 89% 9% -- -52% ike 32.5/s 289% 125% 106% --

Thanks all.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re: Near equal partitions.
by ikegami (Patriarch) on Jan 14, 2011 at 05:14 UTC

    Another approach

    sub part { my ($n, $m) = @_; my $q = int($n / $m); my $r = $n % $m; return ($q+1) x $r, ($q) x ($m-$r); }

      That list generation is lovely!

      Just one question. Is the Perl interpreter smart enough to know that int($n/$m) should use integer division rather than floating point division? i.e. would there be any difference in performance between int($n/$m) vs. ($n-$r)/$m?

      In more strongly typed languages, the type of the variables determines the meaning of the '/' operator (integer division vs. floating point division) so ($n-$r)/$m would be faster. However in Perl, the operator determines the type of the operands, so maybe there is no difference in performance between the two expressions in Perl? Or maybe ($n-$r)/$m is slower because the int(...) gives the Perl interpreter a clue to context and hence the opportunity to use integer division when it sees such a simple expression as int( token / token )?

        Just one question. Is the Perl interpreter smart enough to know that int($n/$m) should use integer division rather than floating point division?

        No.

        i.e. would there be any difference in performance betwee int($n/$m) vs. ($n-$r)/$m?

        Sorry, you asked to ask two questions. But to save you from having to ask to ask another question, I'll answer it anyway.

        Performance should be roughly the same (4 trivial ops in the former, 5 trivial ops in the latter). The latter is better from a numerical methods point of view, no possibility of floating point error.

Re: Near equal partitions.
by ELISHEVA (Prior) on Jan 14, 2011 at 04:58 UTC

    Yup. Your brain is definitely on fry :-) Think %.

    sub part { my ($n, $m) = @_; my $r = $n%$m; # <== my @parts = (($n-$r)/$m) x $m; # <== my $i=0; $parts[$i++]++ while $r--; return @parts; }

    Update: code sample to reflect outcome of Q&A with ikegami.

    Using ikegami's elegant list generation, but without risk of floating point rounding errors:

    sub part { my ($n, $m) = @_; my $r = $n%$m; my $q = ($n-$r)/$m; # avoid int($n/$m) to prevent fp rounding errors return ($q+1) x $r, ($q) x ($m-$r); }

      The efficiency of ikegami's solution appears to be due to the efficiency of its list generation. The second code sample above using ikegami's list generation + integer division is only marginally slower than ikegami's code. I presume this is due to the 4 vs. 5 ops for int($n/$m) vs. ($n-$r)/$m mentioned in ikegami's explanation of the performance differences between these two expression.

      # using Benchmark::cmpthese(-1, ...) on input (10,3) # Debian (Etch) Linux, Perl 5.8.8 Rate syp buk eli_1 eli_2 ike syp 146161/s -- -23% -50% -65% -66% buk 189150/s 29% -- -35% -54% -56% eli_1 289616/s 98% 53% -- -30% -32% eli_2 412559/s 182% 118% 42% -- -3% ike 425821/s 191% 125% 47% 3% --

      I also note that the comparison results for syp, buk, eli don't seem quite the same as BrowserUK's results. The iterations per second are much higher and the comparison % quite different.

      For example,in BrowserUK's results buk is 42% faster than syp, but here only 23%. In BrowserUK's benchmark, buk & eli_1 are roughly comparable (9% difference) and ike is 106% faster than eli. But here, eli_1 is 53% faster than buk and ike is only 47% faster than eli_1.

      Any thoughts on why? Differences in Perl implementations? Inputs? Number of iterations? Just noise due to OS or machine differences?

      Update: The results do seem dependent on input. Changing the input to (10000,3) makes no real difference in benchmarks, but changing the input to (10000, 9999) results in this benchmark. The ranking is still the same, but the results clearly break into two clusters based on list construction: (syp, buk,eli_1) and (ike,eli_2).

      Rate syp buk eli_1 eli_2 ike syp 63.9/s -- -70% -89% -100% -100% buk 212/s 231% -- -62% -100% -100% eli_1 560/s 776% 164% -- -99% -99% eli_2 61265/s 95828% 28846% 10847% -- -3% ike 63015/s 98569% 29673% 11160% 3% --

        It is probably because I'm using a 64-bit build? For reference, here is my benchmark code:

        #! perl -slw use strict; use Benchmark qw[ cmpthese ]; use POSIX qw(ceil); sub buk{ my( $n, $m ) = @_; my @parts = (int( $n / $m )) x $m; $n -= $_ for @parts; my $i =0; $parts[ $i++ ]++ while $n--; return @parts; } sub eli { my ($n, $m) = @_; my $r = $n%$m; # <== my @parts = (($n-$r)/$m) x $m; # <== my $i=0; $parts[$i++]++ while $r--; return @parts; } sub syp { my ($n, $m) = @_; my @parts; for(0 .. $m - 1) { $n -= $parts[$_] = ceil($n / $m); $m -= 1; } return @parts; } sub ike { my ($n, $m) = @_; my $q = int($n / $m); my $r = $n % $m; return ($q+1) x $r, ($q) x ($m-$r); } printf "buk: 97/$_ [%s]\n", join ' ', buk( 97, $_ ) for 2 .. 9; printf "eli: 97/$_ [%s]\n", join ' ', eli( 97, $_ ) for 2 .. 9; printf "syp: 97/$_ [%s]\n", join ' ', syp( 97, $_ ) for 2 .. 9; printf "ike: 97/$_ [%s]\n", join ' ', ike( 97, $_ ) for 2 .. 9; cmpthese -1, { buk => q[ for my $n ( 2 .. 100 ) { for my $m ( 3 .. 50 ) { my @n = buk( $n, $m ); } } ], eli => q[ for my $n ( 2 .. 100 ) { for my $m ( 3 .. 50 ) { my @n = eli( $n, $m ); } } ], syp => q[ for my $n ( 2 .. 100 ) { for my $m ( 3 .. 50 ) { my @n = syp( $n, $m ); } } ], ike => q[ for my $n ( 2 .. 100 ) { for my $m ( 3 .. 50 ) { my @n = ike( $n, $m ); } } ], }

        With regard to the integer/float division performance. Regardless of any raw machine-level differences in integer/double division--which these days are far less pronounced than previously--the cost of Perl math is always dominated by the number of Perl opcodes to perform the calculation, not the raw machine code performance.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Near equal partitions.
by syphilis (Archbishop) on Jan 14, 2011 at 05:14 UTC
    I'd probably do something like:
    sub part { use POSIX qw(ceil); my ($n, $m) = @_; my @parts; for(0 .. $m - 1) { $n -= $parts[$_] = ceil($n / $m); $m -= 1; } return @parts; }
    Cheers,
    Rob
Re: Near equal partitions.
by nif (Sexton) on Jan 14, 2011 at 16:55 UTC

    Sadly I was too late for the discussion...
    The following modification with loop in place of divide operation is almost as fast as the winner, i never thought that divide is so slow.

    sub part { my ($n, $m) = @_; my $q = 0; $q++ while ($n-=$m) > 0; return ($q+1) x ($n+$m), ($q) x (-$n); }
    Update: $n in the Benchmark Re^3: Near equal partitions. is only in the range 2..100, for bigger numbers divide performs match better then the loop.