Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

partition of an array

by mostvisited (Initiate)
on Mar 09, 2009 at 04:47 UTC ( [id://749201]=perlquestion: print w/replies, xml ) Need Help??

mostvisited has asked for the wisdom of the Perl Monks concerning the following question:

I need to divide an array of size n in two halves of size i and j such that |i-j| is 0 or 1 and the difference of the sums of two arrays is minimal.

e.g
for an array {9,1,1,1,1,1,1,1,1,1} the two partitions are
{9,1,1,1,1} and {1,1,1,1,1}

Replies are listed 'Best First'.
Re: partition of an array
by BrowserUk (Patriarch) on Mar 09, 2009 at 06:31 UTC

    You could try something like this:

    #! perl -slw use strict; use List::Util qw[ sum ]; our $N ||= 9; #my @data = ( 9,1,1,1,1,1,1,1,1,1 ); my @data = map{ int rand( $N ) } 1 .. $N; my @a = @data[ 0 .. $#data / 2 ]; my @b = @data[ $#data / 2 + 1 .. $#data ]; my $diff = abs( sum( @a ) - sum( @b ) ); print "$diff : [@a] [@b]"; OUTER: for my $ai ( 0 .. $#a ) { for my $bi ( 0 .. $#b ) { if( abs( sum( @a[ 0 .. $ai-1, $ai+1 .. $#a ], $b[ $bi ] ) - sum( @b[ 0 .. $bi-1, $bi+1 .. $#b ], $a[ $ai ] ) ) < $diff ) { my $temp = $a[ $ai ]; $a[ $ai ] = $b[ $bi ]; $b[ $bi ] = $temp; $diff = abs( sum( @a ) - sum( @b ) ); print "$diff : [@a] [@b]"; last OUTER if $diff == 0; } } }

    It could be coded more efficiently and I haven't yet convinced myself that it will always find the optimum solution, but it seems to get pretty close quite quickly even for quite large arrays:

    C:\test>junk -N=25 13 : [6 8 12 19 12 16 19 2 14 2 12 18 2] [3 23 24 7 11 7 16 10 6 6 12 +4] 7 : [3 8 12 19 12 16 19 2 14 2 12 18 2] [6 23 24 7 11 7 16 10 6 6 12 4 +] 3 : [3 6 12 19 12 16 19 2 14 2 12 18 2] [8 23 24 7 11 7 16 10 6 6 12 4 +] 1 : [3 4 12 19 12 16 19 2 14 2 12 18 2] [8 23 24 7 11 7 16 10 6 6 12 6 +] C:\test>junk -N=25 64 : [11 11 22 2 19 17 17 10 6 20 17 5 20] [21 21 0 12 4 3 8 22 0 6 13 + 3] 42 : [0 11 22 2 19 17 17 10 6 20 17 5 20] [21 21 11 12 4 3 8 22 0 6 13 + 3] 28 : [0 4 22 2 19 17 17 10 6 20 17 5 20] [21 21 11 12 11 3 8 22 0 6 13 + 3] 26 : [0 3 22 2 19 17 17 10 6 20 17 5 20] [21 21 11 12 11 4 8 22 0 6 13 + 3] 20 : [0 0 22 2 19 17 17 10 6 20 17 5 20] [21 21 11 12 11 4 8 22 3 6 13 + 3] 18 : [0 0 21 2 19 17 17 10 6 20 17 5 20] [22 21 11 12 11 4 8 22 3 6 13 + 3] 2 : [0 0 11 2 19 17 17 10 6 20 17 5 20] [22 21 21 12 11 4 8 22 3 6 13 +3] 0 : [0 0 12 2 19 17 17 10 6 20 17 5 20] [22 21 21 11 11 4 8 22 3 6 13 +3] C:\test>junk -N=25 80 : [10 9 23 4 14 22 0 17 19 9 16 24 24] [20 20 7 4 7 13 0 5 5 10 20 +0] 74 : [7 9 23 4 14 22 0 17 19 9 16 24 24] [20 20 10 4 7 13 0 5 5 10 20 +0] 68 : [4 9 23 4 14 22 0 17 19 9 16 24 24] [20 20 10 7 7 13 0 5 5 10 20 +0] 60 : [0 9 23 4 14 22 0 17 19 9 16 24 24] [20 20 10 7 7 13 4 5 5 10 20 +0] 56 : [0 7 23 4 14 22 0 17 19 9 16 24 24] [20 20 10 9 7 13 4 5 5 10 20 +0] 50 : [0 4 23 4 14 22 0 17 19 9 16 24 24] [20 20 10 9 7 13 7 5 5 10 20 +0] 42 : [0 0 23 4 14 22 0 17 19 9 16 24 24] [20 20 10 9 7 13 7 5 5 10 20 +4] 36 : [0 0 20 4 14 22 0 17 19 9 16 24 24] [23 20 10 9 7 13 7 5 5 10 20 +4] 16 : [0 0 10 4 14 22 0 17 19 9 16 24 24] [23 20 20 9 7 13 7 5 5 10 20 +4] 14 : [0 0 9 4 14 22 0 17 19 9 16 24 24] [23 20 20 10 7 13 7 5 5 10 20 +4] 10 : [0 0 7 4 14 22 0 17 19 9 16 24 24] [23 20 20 10 9 13 7 5 5 10 20 +4] 6 : [0 0 5 4 14 22 0 17 19 9 16 24 24] [23 20 20 10 9 13 7 7 5 10 20 4 +] 4 : [0 0 4 4 14 22 0 17 19 9 16 24 24] [23 20 20 10 9 13 7 7 5 10 20 5 +] 2 : [0 0 4 4 13 22 0 17 19 9 16 24 24] [23 20 20 10 9 14 7 7 5 10 20 5 +] 0 : [0 0 4 4 13 22 0 17 19 9 16 23 24] [24 20 20 10 9 14 7 7 5 10 20 5 +] C:\test>junk -N=25 43 : [22 10 14 1 2 1 16 8 19 20 17 24 21] [13 23 2 2 19 1 21 12 1 6 22 + 10] 25 : [13 10 14 1 2 1 16 8 19 20 17 24 21] [22 23 2 2 19 1 21 12 1 6 22 + 10] 3 : [2 10 14 1 2 1 16 8 19 20 17 24 21] [22 23 13 2 19 1 21 12 1 6 22 +10] 1 : [1 10 14 1 2 1 16 8 19 20 17 24 21] [22 23 13 2 19 2 21 12 1 6 22 +10]

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      BrowserUk,
      I really like this approach, but it is flawed. Consider the input:
      7 7 7 1 0 0 0 -42 0 0 6 6 6
      There are 13 items so we will have one partition of 6 and one of 7. your code produces the following solution:
      46 : [7 7 7 1 0 0 0] [-42 0 0 6 6 6] 32 : [0 7 7 1 0 0 0] [-42 7 0 6 6 6] 18 : [0 0 7 1 0 0 0] [-42 7 7 6 6 6] 16 : [0 0 6 1 0 0 0] [-42 7 7 7 6 6]
      The first partition sums to 7 and the second to -9 with a difference of 16. You could move (not swap) the 6 in the first partion to the second partition and you would end up with:
      4 : [0 0 1 0 0 0] [-42 7 7 7 6 6 6]
      The first partition now sums to 1 and the second to -3 for a difference of 4. I don't know the difficulty in accounting for this situation but the problem is assuming the only balancing operation is swapping.

      Cheers - L~R

        Thanks for the edge case.

        A couple of possible fixes spring to mind:

        • Shuffling the input--not a guarentee.
        • Sorting the input--is there an equivalent edge case that would give non-optimum results starting from sorted ordring?
        • Splitting the input both ways--@a shorter than @b and vice versa--and finding the best solution from both datasets. Would that guarentee an optimum?

        Things to play with. Thanks.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      For the universe of positive integers, I sensed a simple solution lurking. I found one but I didn't like how it coded up: It was just pushing the largest onto a half, if it was forced onto the currently smaller half, put it in a protest queue in that half. When ready to add a number to the other half in the presence of a protest, you would instead pop the top of the queue to the other half. This works and is fairly efficient but it is messy in the control code.

      Today, it just struck me that you don't have to keep the halves abs( @left - @right) <= 1 as you create them; just make sure there is enough remaining to fill the other half.

      use List::Util qw{ sum }; sub L() { 0} # left sub R() { 1} # right sub remainder_halves { my $in = shift; my $ar ; @$ar = sort { $b <=> $a } @$in; die "bounds error" if @$ar && $$ar[-1] < 0; my @ans = ( [], [] ); # halves for answer no warnings 'uninitialized'; # summing empty arrays my ( $targ, $other ) = ( L, R ); my ( $halfsize) = int((@$ar+1)/2); while ( @$ar ) { while ( sum( @{$ans[$targ]}) <= sum( @{$ans[$other]}) && @{$ans[$targ]} < $halfsize ) { push @{$ans[$targ]}, shift @$ar; } ( $targ, $other ) = ( $other, $targ); push @{$ans[$targ]}, shift @$ar if @{$ans[$targ]} < $halfsize; } my $score = abs( sum( @{ $ans[L] } ) - sum( @{ $ans[R] } ) ); return $score, $ans[L], $ans[R]; }
      Be well,
      rir
Re: partition of an array
by ELISHEVA (Prior) on Mar 09, 2009 at 06:24 UTC

    On the matter of dividing up items so that the difference in sum is minimized, you might be interested in this thread Average Price Algorithm. It discusses averages rather than sums and allows for an arbitrary number of buckets, instead of just 2. However, some of the issues will be the same since an average is just a sum divided by the number of elements. There were several suggestions on that thread ranging choosing the best from a random selection of solutions to various techniques for using deviation from the mean. Using deviations from the mean lets one take advantage of the fact that anything that pushes bucket's sum up so that its average is above the average for all elements, must, if possible, be complemented by something below the average for all elements. If you read through the thread discussing that solution and study the counter examples, you will see that its success is sensitive to relative factorizations, but that issue should be easily resolvable for the special case of two buckets.

    On the matter of dividing the number of elements into two groups as evenly as possible, you will have to take into account that odd numbers can't be divided evenly by two. There are two ways to deal with this in Perl: one using integers and one using floating point:

    Option 1: use floating point division and strip away fractional portion using int:

    # scalar returns the number of elements in an array my $n = scalar(@aSomeNumbers); my $n1=int($n/2); my $n2=$n-$n1;

    Option 2: Use the mod operator % (see perlop) to determine whether the size of the array is odd or even. $n%2 returns the remainder of division by 2.

    # $n%2 will be 1 if $n is odd and 0 if $n is even. my $n=scalar(@aSomeNumbers); my ($n1, $n2); if ($n%2) { #if $n is even it is safe to divide by 2 $n1=$n2=$n/2; } else { #add one so we get an even number again and can divide safely by 2 $n1=($n+1)/2; $n2=$n-$n1; }

    Best, beth

Re: partition of an array
by irah (Pilgrim) on Mar 09, 2009 at 05:16 UTC

    Sample program as follow

    use strict; use warnings; use Data::Dumper; my (@string, $length, $half); my (@arr1, @arr2); @string = ( 9,1,6,3 ); @arr1 = splice(@string,0,scalar(@string)/2); print Dumper(\@arr1); print Dumper(\@string);
      this code would simply divide the array.. but it doesnt take care of the sum condition. As i have mentioned the difference of sum of elements of two new arrays should be minimal. The new partitions should be balanced also. How to maintain that property ?
Re: partition of an array
by f00li5h (Chaplain) on Mar 09, 2009 at 06:16 UTC

    sort it and stuff the biggest one into each half...

    #! /usr/bin/perl -wT use warnings; use strict; my @kitties = sort qw/ 1 2 3 4 5 9 10 /; my $i; my (@first, @second); while (@kitties){ push @first, shift @kitties; push @second, shift @kitties; } use Data::Dumper; print Dumper (\@first, \@second ); use List::Util qw[ sum ]; print "we have ", sum( @first ), " and ", sum( @second );

    Output

    $VAR1 = ['1','2','4','9']; $VAR2 = ['10','3','5',undef]; we have 16 and 18

    this puts undefs in if there is an uneven number of elements (as this code shows)

    @_=qw; ask f00li5h to appear and remain for a moment of pretend better than a lifetime;;s;;@_[map hex,split'',B204316D8C2A4516DE];;y/05/os/&print;

      This approach is suboptimal for the following sample case:

      my @kitties = qw/1 1 1 2 2 2 2 4/;

      In this example, the greedy approach will fail, producing

      [4,2,2,1], [2,2,1,1]

      while the best solution would be

      [4,1,1,1], [2,2,2,2]

      But due to the homeworky nature of the problem I won't go into this further :)

        Does not.

        Output:

        $VAR1 = [ '1', '1', '2', '2' ]; $VAR2 = [ '1', '2', '2', '4' ]; we have 6 and 9

        You dropped the sort bit!

        ... some time passes ...

        Just snagging the biggest one and stuffing it in a sack works pretty well for the knapsack problem... and since this one was only 2 partitions it didn't do too bad ;)

        @_=qw; ask f00li5h to appear and remain for a moment of pretend better than a lifetime;;s;;@_[map hex,split'',B204316D8C2A4516DE];;y/05/os/&print;

Re: partition of an array
by sundialsvc4 (Abbot) on Mar 09, 2009 at 14:01 UTC

    I am not directly acquainted with this problem, but it appears to me that the constraint, “such that |i-j| is 0 or 1,” is merely a formal way of saying, “in half.” Off the top of my head, I think that I can assert that there is only one point in any list where such a constraint would be true in a list with an even number of elements, and only two such points with an odd number.

    Therefore, “how is the ‘minimize’ constraint to be satisfied?” It would seem obvious to me that you must alter the order of the elements of the list in some way... a task that could be equally described in terms of “two separate lists.” With such a definition, one might solve the problem by sorting the master-list, then distributing the numbers taken from both the head and the tail of that sorted master-list into the two buckets.

    But, again referring to my previous post, I would first “study the literature” to find where someone else has effectively solved the problem for you already. “Heck, it's got to be in CPAN somewhere...”

      I think that I can assert that there is only one point in any list where such a constraint would be true in a list with an even number of elements

      Even without ordering, that's not true.

      (1 2 3 4 5 6): (1 2 3) (4 5 6) d=9 (1 2 4) (3 5 6) d=7 (1 2 5) (3 4 6) d=5 (1 2 6) (3 4 5) d=3 (1 3 4) (2 5 6) d=5 (1 3 5) (2 4 6) d=3 (1 3 6) (2 4 5) d=1 <-- (1 4 5) (2 3 6) d=1 <-- solutions (1 4 6) (2 3 5) d=1 <-- (1 5 6) (2 3 4) d=3
Re: partition of an array
by targetsmart (Curate) on Mar 09, 2009 at 05:03 UTC
    use splice to break up array.

    Vivek
    -- In accordance with the prarabdha of each, the One whose function it is to ordain makes each to act. What will not happen will never happen, whatever effort one may put forth. And what will happen will not fail to happen, however much one may seek to prevent it. This is certain. The part of wisdom therefore is to stay quiet.
      splice could be used to create the partition but how to get the kind of partition which satisfies the mentioned conditions. Actually i m not able to find any algorithm to implement it.
Re: partition of an array
by sundialsvc4 (Abbot) on Mar 09, 2009 at 13:53 UTC

    Can a really great book really be published by Springer-Verlag?

    Yes! ... if you can find it anymore. How To Solve It, by Zbignie Michalewicz. A recent re-publication (co-authored by David B. Fogel) is more readily available, ISBN 978-3540224945.

    This is a particularly-readable example of a concentrated book of heuristics. (Not just “algorithms,” and there is a difference.) It shows you how to approach many practical problems whose solutions are not immediately obvious.

Re: partition of an array
by rir (Vicar) on Mar 17, 2009 at 02:20 UTC
    When I realized that this wasn't going to be quick and easy, I decided that having a brute force implementation upon which to compare was necessary. Now, I think there is nothing better than to take a drink and a long walk.

    It doesn't seem that anyone has provided a correct solution. I believe this is one.

    Be well,
    rir

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://749201]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (4)
As of 2024-03-28 13:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found