Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

accurately rounding numbers for percentages

by derekn (Initiate)
on Aug 02, 2009 at 22:10 UTC ( [id://785290]=perlquestion: print w/replies, xml ) Need Help??

derekn has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to calculate percentages. For example, user has 5 choices, each choice will be displayed as percentage of total votes. The problem is that percentages are not displayed nicely as whole numbers (eg, 92.84513%). When using rounding methods to get this to the whole number (93%), the numbers sometimes don't add up to 100 as they should, thus making the percentage reflected inaccurate. Sometimes it's 99, sometimes 101, so on. I have used $percent=sprintf("%.0f", $value) to calculate this but no luck. Any ideas how to accomplish this so that they add up to 100%? Derek
  • Comment on accurately rounding numbers for percentages

Replies are listed 'Best First'.
Re: accurately rounding numbers for percentages
by ysth (Canon) on Aug 03, 2009 at 00:20 UTC
Re: accurately rounding numbers for percentages
by Trimbach (Curate) on Aug 02, 2009 at 22:24 UTC
    What you're asking is not possible. Anytime you round a number you're going to introduce error, how much error will depend on how much you're rounding. Add enough errors together and your total will always be off from the "expected" total (in this case 100%).

    The only way around this is to go ahead and round the individual entries to whole numbers for display, but when calculating the total don't add the rounded entries, add the unrounded entries, and then round the result for display, if you want.

    Gary Blackburn
    Trained Killer

      So i'm gonna have to live with "37%, 23%, 9%, 16%, 16%" (rounded values) equalling 101, even though it SHOULD equal 100%?
        That's not what Trimbach said, by a long shot.

        If you add the UNrounded numbers percentages, they should total 100% (except for the fact that you'll sometimes run into value/count pairs that are rounded at the end of whatever length decimal value you use: 100/6, for example).

        But, for cases such as I infer yours is, a quite standard and commonly accepted practice is to include the disclaimer "Totals may not equal 100% because of rounding."

        Update: For clarity (in light of OP's next reply), s/numbers/percentages/ at strikeout above.

Re: accurately rounding numbers for percentages
by GrandFather (Saint) on Aug 02, 2009 at 22:40 UTC

    If you had the results:

    20.2, 20.2, 20.2, 20.2, 19.2

    which would you change when rounded to integer values so the sum was 100?


    True laziness is hard work
      Lies, Damned Lies, and Statistics
      -- Benjamin Disraeli
      The last one of course, minimizing the break of symmetry! ;)

      I think 33 1/3 ,33 1/3 ,33 1/3 might make your point clearer... 8)

      Cheers Rolf

      PS: this reminds me of the extra rules for the group phase in football tournaments to decide who continues ...

      same points? oh!

      same number of goals? oh!

      direct comparison undecided? oh!

      ... and so on, and if nothing can be chosen for a decision they finally flip a coin! 8)

      eg UEFA_Euro_2008#Tie-breaking_criteria

Re: accurately rounding numbers for percentages
by ELISHEVA (Prior) on Aug 03, 2009 at 15:06 UTC

    ysth's node above has a link to a nice essay on fudging numbers so that they round up to 100. Apparently in the author's company, they fudge the numbers to add up to 100 so that the help desk isn't inundated with complaints about "mistakes" in the reports the publish. So there may be some situations where, reality aside, one may really need to make those numbers add up to 100!

    The question then becomes how to do this so that one minimizes mistaken impressions. One's choice will depend a great deal on how one expects people to view the numbers. If one thinks that readers are making judgements based on absolute percentages then you will want to add your fudge factor to the largest numbers. Adding 1 to 1% doubles it whereas adding 1 to 98% is rather insignificant.

    However, percentages are relative measures by nature. Thus one might also assume that readers are making judgements based on relative percentages more than absolute percentages. In that case, one might argue that fudge factors should be randomly to the percentages to avoid bias. I don't know which is best. I found several articles on subjective perceptions of statistics via google, but most of them were from paid collections and would have required a trip to the university library. Unfortunately, I didn't have the time to look them up.

    The article ysth linked to also had a nice sample of test data, so I decided to work up the case of random assignment of fudge factors along with a test suite based on Test::More.

    The test suite is wrapped in a subroutine, runTests to make it easier to test alternative algorithms. If you would like to try your own alternate algorithm against the test suite, pass a code reference. Alternate fudging routines should accept two parameters: ($precision, $aHistogram). $precision is the number of decimal digits in your total. For example, if $precision == 2 then your percentages must add up to 100.00. $aHistograph is a histogram whose numbers can add up to anything. The fudging subroutine is responsible for converting them to percentages.

    Best, beth

Re: accurately rounding numbers for percentages
by jbt (Chaplain) on Aug 02, 2009 at 23:28 UTC
    Could you store the numbers as numerator/denominator integers and then do integer arithmetic?
Re: accurately rounding numbers for percentages
by ig (Vicar) on Aug 03, 2009 at 14:06 UTC

    You can have the quantized percentages to add to 100 but doing so will increase the quantization error compared with rounding. Doing so minimizes the aggregate error rather than the individual errors. While others have advocated minimizing the individual errors, there may be cases where minimizing the aggregate error is preferable.

    The following example demonstrates one way the aggregate error can be minimized. The implementation is crude, not well tested and replete with print statements which may help you follow what it is doing.

    use warnings; use strict; use Data::Dumper; my @percentages = generate(); print "@percentages\n"; my @quantized = quantize(1000,@percentages); print "Original percentages: @percentages\n"; print "Quantized percentages: @quantized\n"; my $sum; $sum += $_ foreach(@quantized);; print "Sum of quantized percentages: $sum\n"; =head2 my @quantized = quantize($factor, @percentages); The quantize() function takes a quantizaton factor and an array of percentages which should add to 100%. It returns an array of quantized percentages which does add to 100%. The percentages are quantized to multiples of (100/$factor). The function minimizes the worst case error. Two error functions are provided: one is the absolute error (the difference between the original value and the quantized value) and the other is the absolute relative error (the absolute error divided by the value being quantized). There are many other possibilities, depending on your needs. =cut sub quantize { my $quantum = 100 / shift; my $error = 0; my $sum = 0; my @x = map { my $q = sprintf("%0.0f", $_/$quantum) * $quantum; my $d = $q - $_; $error += $d; $sum += $q; [ $_, $q, $d ] } @_; print Dumper(\@x); print "initial total error: $error\n"; print "initial sum: $sum\n"; while(abs($sum - 100) > $quantum/2) { my $direction = ($sum > 100) ? 1 : -1 ; my $min_error = 10000; my $min_index = 0; print "errors of adjusted values: "; foreach my $i (0..(@x-1)) { my $e = abs($x[$i]->[2] - $quantum * $direction) / $x[$i]- +>[0]; # relative error #my $e = abs($x[$i]->[2] - $quantum * $direction); + # absolute error print " $e"; if($e < $min_error) { $min_error = $e; $min_index = $i; print "(i = $i)"; } } print "\n"; print "adjust $min_index: $x[$min_index]->[0], $x[$min_index]- +>[1] $x[$min_index]->[2]\n"; $x[$min_index]->[1] -= $quantum * $direction; $x[$min_index]->[2] -= $quantum * $direction; print "\t$x[$min_index]->[1], $x[$min_index]->[2]\n"; $sum -= $quantum * $direction; } return(map { $_->[1] } @x); } =head2 generate() The generate() function generates a somewhat random array of percentages that adds to 100%. =cut sub generate { my $sum = 0; my @percentages; foreach (1..20) { my $x = rand(50); if($sum + $x < 100) { push(@percentages, $x); $sum += $x; } } push(@percentages, 100 - $sum); return(@percentages); }
Re: accurately rounding numbers for percentages
by scorpio17 (Canon) on Aug 03, 2009 at 13:48 UTC

    Let's say you've got 5 percentages. Sort them, from high to low, then make the smallest one 100-(sum of the 4 bigger ones). This forces them to add up the way you want, but it pushes all the round off error into the smallest percentage. Another way is to make the largest value 100-(sum of the 4 smallest). This pushes the error into the largest value. Neither way is "correct" in a strict mathematical sense, but I'm assuming that's not much of a priority for you anyway.

Re: accurately rounding numbers for percentages
by scorpio17 (Canon) on Aug 03, 2009 at 17:52 UTC

    I just had another idea: dynamically generate a pie chart using something like GD. Then you don't even have to show the actual numbers (a picture is worth a thousand words, etc.)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://785290]
Approved by Perlbotics
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2024-04-20 00:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found