Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^2: Average Price Algorithm

by ELISHEVA (Prior)
on Feb 02, 2009 at 23:29 UTC ( [id://740848]=note: print w/replies, xml ) Need Help??


in reply to Re: Average Price Algorithm
in thread Average Price Algorithm

I was sufficiently frustrated by the discussion in the note above, that I decided to just go ahead and correct the algorithm for situations where there are same sized buckets. The code posted below solves Grandfather's "Nasty distribution" perfectly. It is still O(N) where N is the number of items to allocate.

The algorithm below is a hybrid of the one proposed by Limbic-Region and the one I originally proposed. Smallest buckets are filled first (as per Limbic-Region), but when we encounter N equal sized buckets we allocate items N (one per bucket) at a time until we can't. Then we go back to allocating items to buckets one by one. This prevents the first of N buckets from hogging all the "good" values.

I'd still very much appreciate it if someone could find an example or two that doesn't work with this algorithm. Though I'm not convinced that this problem is anything like NP complete, I'm sure the code I've posted has room for improvement.

use strict; use warnings; sub demoAllocation($$$); demoAllocation ("Distribution: all at mean" , {a=>1000,b=>2000,c=>3000} , { '6.0' => 6000 } ); demoAllocation ("Distribution: unskewed" , {a=>1000,b=>2000,c=>3000} , { '3.0' => 300, '4.0' => 600, '5.0' => 700, '5.5' => 900 , '6.0' => 1000 , '6.5' => 900, '7.0' => 700, '8.0' => 600, '9.0' => 300 } ); demoAllocation ("Distribution: skewed" , {a=>1000,b=>2000,c=>3000} , { '3.0' => 4000, '12.0' => 2000 } ); demoAllocation ("Distribution: skewed: Nasty (Relatively prime)" , {a=>3, b=>3} , {'1.0' => 4, '2.0' => 1, '4.0' => 1 } ); demoAllocation ("Distribution: skewed: Nasty (Grandfather)" , {a=>30, b=>30} , {'1.0' => 40, '2.0' => 10, '4.0' => 10 } ); demoAllocation ("Distribution: Original poster" , {A=>65, B=>12, C=>24, D=>19, E=>30} , {'93.8' => 5, '93.81' => 20, '93.82' => 10 , '93.83' => 15, '93.84' => 25, '93.85' => 5 , '93.87'=>20, '94.0' => 5, '94.1' => 35 , '94.2'=> 10 } ); demoAllocation ("Distribution: camelback: Nasty mark II (Grandfather)/2 " , { a => 15, b => 10, c => 5 } , {'1.0' => 15, '2.0' => 6, '4.0' => 3, '8.0' => 6 } ); demoAllocation ("Distribution: camelback: Nasty mark II (Grandfather)" , { a => 30, b => 20, c => 10 } , {'1.0' => 30, '2.0' => 12, '4.0' => 6, '8.0' => 12 } ); demoAllocation ("Distribution: flat: Limbic~Region" , { a => 3, b => 4, c => 2, d => 2 } , {'1.0' => 1, '2.0' => 1, '3.0' => 1, '4.0' => 1 , '5.0' => 1, '6.0' => 1, '7.0' => 1, '8.0' => 1 , '9.0' => 1, '10.0' => 1, '11.0' => 1 } ); #------------------------------------------------------------ sub demoAllocation($$$) { my ($sDescription, $hBuckets, $hFrequency) = @_; print "$sDescription\n"; my ($dAvg, $hAllocation) = allocate($hBuckets, $hFrequency); foreach my $sId (sort keys %$hAllocation) { my $hItems = $hAllocation->{$sId}; my $dSum = 0; my $iCount = 0; my ($dBucketAvg, $dDeviation); my $iBucketSize = $hBuckets->{$sId}; print "$sId:"; foreach my $dValue (sort keys %$hItems) { my $iFreq = $hItems->{$dValue}; printf "\t%s \@ \$%.2f\n", $iFreq, $dValue; $dSum += $dValue*$iFreq; $iCount += $iFreq; } $dBucketAvg = $dSum/$iCount; $dDeviation = $dBucketAvg - $dAvg; if ($iBucketSize != $iCount) { printf "\t**ERROR**: bucket size: %d, actual allocation: %d\n" , $iBucketSize, $iCount; } printf "\tcount(e/a): %d/%d, bucket avg: \$%.2f, " ."deviation: \$%.3f\n\n" , $iBucketSize, $iCount, $dBucketAvg, $dDeviation; } print "\n"; } #------------------------------------------------------------ sub allocate($$) { my ($hBuckets, $hFrequency) = @_; #calculate deviations from the mean my $dAvg=calcWeightedAvg($hFrequency); my ($iFreqAvg, $aAbove, $aBelow) = calcDeviations($hFrequency, $dAvg); #sort buckets by size: smallest first my $hBucketsBySize = groupBucketsBySize($hBuckets); my @aBuckets = map { [ $_, $hBucketsBySize->{$_} ] } sort keys %$hBucketsBySize; #my @aBuckets = sort { $hBuckets->{$a} <=> $hBuckets->{$b} # } keys %$hBuckets; #allocate items to buckets, smallest first my %hAllocations; my $iFirstAbove = 0; my $iFirstBelow = 0; foreach my $aSameSizeBuckets (@aBuckets) { my $iSize = $aSameSizeBuckets->[0]; my $aIds = $aSameSizeBuckets->[1]; #my $iSize = $hBuckets->{$sId}; fillBucket($iSize, \%hAllocations, $aIds , $dAvg, \$iFreqAvg , $aAbove, \$iFirstAbove , $aBelow, \$iFirstBelow); } return ($dAvg, \%hAllocations); } #------------------------------------------------------------ # SUPPORTING FUNCTIONS - alphabetical order #------------------------------------------------------------ sub calcDeviations($$) { my ($hFrequency, $dAvg) = @_; my @aAbove; my @aBelow; my $iFreqAvg = 0; #calculate deviations from mean while (my ($dValue,$iFreq) = each(%$hFrequency)) { if ($dValue == $dAvg) { $iFreqAvg+=$iFreq; next; } my $dDeviation = $dValue - $dAvg; if (0 < $dDeviation) { push @aAbove, [ $dDeviation, $dValue, $iFreq ]; } else { push @aBelow, [ -$dDeviation, $dValue, $iFreq ]; } } #sort with smallest deviations first return ( $iFreqAvg , [ sort { compareDeviations($a,$b) } @aAbove ] , [ sort { compareDeviations($a,$b) } @aBelow ] ); } #------------------------------------------------------------ sub compareDeviations($$) { my ($x, $y) = @_; return $x->[0] <=> $y->[0]; } #------------------------------------------------------------ sub calcWeightedAvg($) { my $hFrequency = shift @_; my $dSum=0; my $iCount=0; while (my ($dValue,$iFreq) = each(%$hFrequency)) { $dSum+=$dValue*$iFreq; $iCount+=$iFreq; } return $dSum/$iCount; } #------------------------------------------------------------ sub fillBucket($$$$$$$) { my ($iSize, $hAllocations, $aIds , $dAvg, $rFreqAvg , $aAbove, $rFirstAbove , $aBelow, $rFirstBelow) = @_; my $iFilled = 0; #Phase I - allocate to all buckets simultaneously #fillBucketOneByOne($iSize, $hAllocations, $aIds, \$iFilled # , $dAvg, $rFreqAvg # , $aAbove, $rFirstAbove # , $aBelow, $rFirstBelow); #$iSize -= $iFilled; #Phase II - allocate to each bucket individually foreach my $sId (@$aIds) { next unless $iSize; $iFilled = 0; fillBucketOneByOne($iSize , $hAllocations, [$sId], \$iFilled , $dAvg, $rFreqAvg , $aAbove, $rFirstAbove , $aBelow, $rFirstBelow); } } #------------------------------------------------------------ sub fillBucketOneByOne($$$$$$$$$) { my ($iNeeded, $hAllocations, $aIds, $rFilled , $dAvg, $rFreqAvg , $aAbove, $rFirstAbove , $aBelow, $rFirstBelow) = @_; my $iBucketCount = scalar(@$aIds); #take items that are at the mean, if we can if ($iNeeded*$iBucketCount <= $$rFreqAvg) { foreach (@$aIds) { $hAllocations->{$_}->{$dAvg} = $iNeeded; $$rFreqAvg-=$iNeeded; } $$rFilled = $iNeeded; return; } elsif ($iNeeded % $iBucketCount) { return; } while ($iBucketCount <= $$rFreqAvg) { foreach (@$aIds) { $hAllocations->{$_}->{$dAvg}++; $$rFreqAvg--; } $$rFilled++; $iNeeded--; } my $aUp = $aAbove->[$$rFirstAbove]; my $aDown = $aBelow->[$$rFirstBelow]; my $dNetDeviation = 0; #take whatever creates the smallest net deviation # [0] deviation # [1] value # [2] frequency while ($iNeeded > 0) { my $bUseUp = 0; if ($aUp) { if ($aDown) { my $dNetUp = $dNetDeviation + $aUp->[0]; my $dNetDown = $dNetDeviation - $aDown->[0]; if (abs($dNetUp) < abs($dNetDown)) { $bUseUp = 1; $dNetDeviation = $dNetUp; } else { $bUseUp = 0; $dNetDeviation = $dNetDown; } } else { $bUseUp = 1; } } elsif ($aDown) { $bUseUp = 0; } else { return; } if ($bUseUp) { #$hItems->{$aUp->[1]} ++; return if ($aUp->[2] % $iBucketCount); foreach (@$aIds) { $hAllocations->{$_}->{$aUp->[1]} ++; $aUp->[2]--; } $$rFirstAbove++ unless $aUp->[2]; $aUp = $aAbove->[$$rFirstAbove]; } else { #$hItems->{$aDown->[1]} ++; return if ($aDown->[2] % $iBucketCount); foreach (@$aIds) { $hAllocations->{$_}->{$aDown->[1]} ++; $aDown->[2]--; } $$rFirstBelow++ unless $aDown->[2]; $aDown = $aBelow->[$$rFirstBelow]; } $$rFilled++; $iNeeded--; } #return $hItems; } #------------------------------------------------------------ sub groupBucketsBySize($) { my $hBuckets = shift @_; my $hBucketsBySize = {}; while (my ($sId, $iSize) = each (%$hBuckets)) { my $aIds = $hBucketsBySize->{$iSize}; $aIds = $hBucketsBySize->{$iSize} = [] unless $aIds; push @$aIds, $sId; } return $hBucketsBySize; }

Best, beth

Update 1: fixed bug in code that was originally posted.

Update 2: (Feb 3, 8:00 UTC) fixed bucket count bug identified below by Grandfather and Limbic~Region. I also made the output scream **ERROR** if this kind of mistake happens again. This is only a bug fix, however. The bucket counts are now right but the distributions discussed below - "Nasty mark II (Grandfather)" and "Limbic~Region" are still suboptimal.

Replies are listed 'Best First'.
Re^3: Average Price Algorithm
by GrandFather (Saint) on Feb 03, 2009 at 01:13 UTC
    demoAllocation ( "Distribution: skewed: Nasty mark II (Grandfather)", {a => 30, b => 20, c => 10}, {'1.0' => 30, '2.0' => 12, '4.0' => 6, '8.0' => 12} );

    Prints:

    Distribution: skewed: Nasty mark II (Grandfather) a: 22 @ $1.00 8 @ $8.00 bucket avg: $2.87, deviation: $-0.033 b: 8 @ $1.00 6 @ $2.00 2 @ $4.00 4 @ $8.00 bucket avg: $3.00, deviation: $0.100 c: 6 @ $2.00 4 @ $4.00 bucket avg: $2.80, deviation: $-0.100

    However, allocating 1/2 of each parcel to a, 1/3 to b and 1/6 to c gives a perfect result.


    Perl's payment curve coincides with its learning curve.
Re^3: Average Price Algorithm
by ELISHEVA (Prior) on Feb 03, 2009 at 08:48 UTC
    Many thanks again to Grandfather and Limbic~Region! Both for the bug find and the sub-optimal examples.

    Both examples are food for thought. Limbic-Region's distribution is interesting because it is a good example of where looking ahead only to the next largest deviation does not yield an optimal result. The smallest sized buckets can only be optimally filled by ignoring 6.0 even though it is smack on the mean (a deviation of 0). Grandfather's "nasty mark II" shows clearly that least(greatest?) common denominators (i.e. 2) and not just individual bucket size or frequency of bucket size affect the result.

    Best, beth
Re^3: Average Price Algorithm
by Limbic~Region (Chancellor) on Feb 03, 2009 at 02:50 UTC
    ELISHEVA,
    demoAllocation ( "Distribution: Limbic~Region", {a => 3, b => 4, c => 2, d => 2}, {'1.0' => 1, '2.0' => 1, '3.0' => 1, '4.0' => 1, '5.0' => 1, '6.0' + => 1, '7.0' => 1, '8.0' => 1, '9.0' => 1, '10.0' => 1, '11.0' => 1 } ); __DATA__ Distribution: Limbic~Region a: 1 @ $2.00 1 @ $3.00 1 @ $9.00 bucket avg: $4.67, deviation: $-1.333 b: 1 @ $1.00 1 @ $10.00 1 @ $11.00 bucket avg: $7.33, deviation: $1.333 c: 1 @ $5.00 1 @ $6.00 1 @ $7.00 bucket avg: $6.00, deviation: $0.000 d: 1 @ $4.00 1 @ $8.00 bucket avg: $6.00, deviation: $0.000

    It is easy to show that this is not the best result. Consider a perfect distribution:

    • 1, 11, 6
    • 2, 10, 3, 9
    • 7, 5
    • 8, 4

    Cheers - L~R

      It is not just not a best result, it is an invalid result! b wanted 4 units but got 3 and c wanted 2 units but got 3.


      Perl's payment curve coincides with its learning curve.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://740848]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2024-03-28 20:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found