accurately rounding numbers for percentages

derekn has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: accurately rounding numbers for percentages by ysth (Canon) on Aug 03, 2009 at 00:20 UTC
This problem was a perl quiz of the week: see http://osdir.com/ml/lang.perl.qotw.quiz-of-the-week/2002-12/msg00000.html. For individual solutions submitted, see: http://osdir.com/ml/lang.perl.qotw.discuss/2002-11/threads.html and http://osdir.com/ml/lang.perl.qotw.discuss/2002-12/threads.html. -- Online Fortune Cookie Search Office Space merchandise	[reply]
Re: accurately rounding numbers for percentages by Trimbach (Curate) on Aug 02, 2009 at 22:24 UTC
What you're asking is not possible. Anytime you round a number you're going to introduce error, how much error will depend on how much you're rounding. Add enough errors together and your total will always be off from the "expected" total (in this case 100%). The only way around this is to go ahead and round the individual entries to whole numbers for display, but when calculating the total don't add the rounded entries, add the unrounded entries, and then round the result for display, if you want. Gary Blackburn Trained Killer	[reply]
Re^2: accurately rounding numbers for percentages by derekn (Initiate) on Aug 02, 2009 at 22:53 UTC
So i'm gonna have to live with "37%, 23%, 9%, 16%, 16%" (rounded values) equalling 101, even though it SHOULD equal 100%?	[reply]
Re^3: accurately rounding numbers for percentages by ww (Archbishop) on Aug 02, 2009 at 23:45 UTC
That's not what Trimbach said, by a long shot. If you add the UNrounded ~~numbers~~ percentages, they should total 100% (except for the fact that you'll sometimes run into value/count pairs that are rounded at the end of whatever length decimal value you use: 100/6, for example). But, for cases such as I infer yours is, a quite standard and commonly accepted practice is to include the disclaimer "Totals may not equal 100% because of rounding." Update: For clarity (in light of OP's next reply), `s/numbers/percentages/` at strikeout above.	[reply] [d/l]
Re^4: accurately rounding numbers for percentages by derekn (Initiate) on Aug 03, 2009 at 01:57 UTC
Re^5: accurately rounding numbers for percentages by ww (Archbishop) on Aug 03, 2009 at 04:13 UTC
Re^5: accurately rounding numbers for percentages by Trimbach (Curate) on Aug 03, 2009 at 03:23 UTC
Some notes below your chosen depth have not been shown here
Re: accurately rounding numbers for percentages by GrandFather (Saint) on Aug 02, 2009 at 22:40 UTC
If you had the results: `20.2, 20.2, 20.2, 20.2, 19.2` [download] which would you change when rounded to integer values so the sum was 100? True laziness is hard work	[reply] [d/l]
Re^2: accurately rounding numbers for percentages by spazm (Monk) on Aug 02, 2009 at 23:21 UTC
Lies, Damned Lies, and Statistics -- Benjamin Disraeli	[reply]
Re^2: accurately rounding numbers for percentages by LanX (Saint) on Aug 03, 2009 at 23:46 UTC
The last one of course, minimizing the break of symmetry! ;) I think 33 1/3 ,33 1/3 ,33 1/3 might make your point clearer... 8) Cheers Rolf PS: this reminds me of the extra rules for the group phase in football tournaments to decide who continues ... same points? oh! same number of goals? oh! direct comparison undecided? oh! ... and so on, and if nothing can be chosen for a decision they finally flip a coin! 8) eg UEFA_Euro_2008#Tie-breaking_criteria	[reply]
Re: accurately rounding numbers for percentages by ELISHEVA (Prior) on Aug 03, 2009 at 15:06 UTC
ysth's node above has a link to a nice essay on fudging numbers so that they round up to 100. Apparently in the author's company, they fudge the numbers to add up to 100 so that the help desk isn't inundated with complaints about "mistakes" in the reports the publish. So there may be some situations where, reality aside, one may really need to make those numbers add up to 100! The question then becomes how to do this so that one minimizes mistaken impressions. One's choice will depend a great deal on how one expects people to view the numbers. If one thinks that readers are making judgements based on absolute percentages then you will want to add your fudge factor to the largest numbers. Adding 1 to 1% doubles it whereas adding 1 to 98% is rather insignificant. However, percentages are relative measures by nature. Thus one might also assume that readers are making judgements based on relative percentages more than absolute percentages. In that case, one might argue that fudge factors should be randomly to the percentages to avoid bias. I don't know which is best. I found several articles on subjective perceptions of statistics via google, but most of them were from paid collections and would have required a trip to the university library. Unfortunately, I didn't have the time to look them up. The article ysth linked to also had a nice sample of test data, so I decided to work up the case of random assignment of fudge factors along with a test suite based on Test::More. The test suite is wrapped in a subroutine, `runTests` to make it easier to test alternative algorithms. If you would like to try your own alternate algorithm against the test suite, pass a code reference. Alternate fudging routines should accept two parameters: `($precision, $aHistogram)`. `$precision` is the number of decimal digits in your total. For example, if `$precision == 2` then your percentages must add up to `100.00`. `$aHistograph` is a histogram whose numbers can add up to anything. The fudging subroutine is responsible for converting them to percentages. Read more... (3 kB) Best, beth	[reply] [d/l] [select]
Re: accurately rounding numbers for percentages by jbt (Chaplain) on Aug 02, 2009 at 23:28 UTC
Could you store the numbers as numerator/denominator integers and then do integer arithmetic?	[reply]
Re: accurately rounding numbers for percentages by ig (Vicar) on Aug 03, 2009 at 14:06 UTC
You can have the quantized percentages to add to 100 but doing so will increase the quantization error compared with rounding. Doing so minimizes the aggregate error rather than the individual errors. While others have advocated minimizing the individual errors, there may be cases where minimizing the aggregate error is preferable. The following example demonstrates one way the aggregate error can be minimized. The implementation is crude, not well tested and replete with print statements which may help you follow what it is doing. use warnings; use strict; use Data::Dumper; my @percentages = generate(); print "@percentages\n"; my @quantized = quantize(1000,@percentages); print "Original percentages: @percentages\n"; print "Quantized percentages: @quantized\n"; my $sum; $sum += $_ foreach(@quantized);; print "Sum of quantized percentages: $sum\n"; =head2 my @quantized = quantize($factor, @percentages); The quantize() function takes a quantizaton factor and an array of percentages which should add to 100%. It returns an array of quantized percentages which does add to 100%. The percentages are quantized to multiples of (100/$factor). The function minimizes the worst case error. Two error functions are provided: one is the absolute error (the difference between the original value and the quantized value) and the other is the absolute relative error (the absolute error divided by the value being quantized). There are many other possibilities, depending on your needs. =cut sub quantize { my $quantum = 100 / shift; my $error = 0; my $sum = 0; my @x = map { my $q = sprintf("%0.0f", $_/$quantum) * $quantum; my $d = $q - $_; $error += $d; $sum += $q; [ $_, $q, $d ] } @_; print Dumper(\@x); print "initial total error: $error\n"; print "initial sum: $sum\n"; while(abs($sum - 100) > $quantum/2) { my $direction = ($sum > 100) ? 1 : -1 ; my $min_error = 10000; my $min_index = 0; print "errors of adjusted values: "; foreach my $i (0..(@x-1)) { my $e = abs($x[$i]->[2] - $quantum * $direction) / $x[$i]- +>[0]; # relative error #my $e = abs($x[$i]->[2] - $quantum * $direction); + # absolute error print " $e"; if($e < $min_error) { $min_error = $e; $min_index = $i; print "(i = $i)"; } } print "\n"; print "adjust $min_index: $x[$min_index]->[0], $x[$min_index]- +>[1] $x[$min_index]->[2]\n"; $x[$min_index]->[1] -= $quantum * $direction; $x[$min_index]->[2] -= $quantum * $direction; print "\t$x[$min_index]->[1], $x[$min_index]->[2]\n"; $sum -= $quantum * $direction; } return(map { $_->[1] } @x); } =head2 generate() The generate() function generates a somewhat random array of percentages that adds to 100%. =cut sub generate { my $sum = 0; my @percentages; foreach (1..20) { my $x = rand(50); if($sum + $x < 100) { push(@percentages, $x); $sum += $x; } } push(@percentages, 100 - $sum); return(@percentages); } [download]	[reply] [d/l]
Re: accurately rounding numbers for percentages by scorpio17 (Canon) on Aug 03, 2009 at 13:48 UTC
Let's say you've got 5 percentages. Sort them, from high to low, then make the smallest one 100-(sum of the 4 bigger ones). This forces them to add up the way you want, but it pushes all the round off error into the smallest percentage. Another way is to make the largest value 100-(sum of the 4 smallest). This pushes the error into the largest value. Neither way is "correct" in a strict mathematical sense, but I'm assuming that's not much of a priority for you anyway.	[reply]
Re: accurately rounding numbers for percentages by scorpio17 (Canon) on Aug 03, 2009 at 17:52 UTC
I just had another idea: dynamically generate a pie chart using something like GD. Then you don't even have to show the actual numbers (a picture is worth a thousand words, etc.)	[reply]