Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^2: Variables are automatically rounded off in perl (humans)

by tye (Sage)
on Jul 21, 2016 at 21:13 UTC ( #1168277=note: print w/replies, xml ) Need Help??


in reply to Re: Variables are automatically rounded off in perl
in thread Variables are automatically rounded off in perl

It is not a bug that Perl rounds the display of numeric values to a number of digits that is fewer than the number of digits of precision that the internal floating point format achieves. It is an intentional feature due to the fact that you get even more complaints when a computation should yield 1/100 but the result gets displayed as "0.01000000000000002" (than the number of complaints for such not being "== 0.01").

Over time there have been adjustments as to how much difference there should be between the precision of the local floating point format vs. precision of Perl's display of numbers. But the (non-zero) difference should remain.

Wanting to see about 17 '9's or '0's near the end of a displayed number indicates a rather unrealistic expectation. 15 digits of accuracy is more than enough to notice that somebody misplaced a single grain of sand from your enormous beach. It is not important to inform a human that a single grain of sand was lost from the beach. The vast majority of situations are such that the loss of a single grain of sand from the beach is utterly insignificant. If you are in one of those rarefied situations where the exact number of grains of sand on your huge beach must not be off by even 1, then you shouldn't just be using mundane floating point.

- tye        

Replies are listed 'Best First'.
Re^3: Variables are automatically rounded off in perl (humans)
by syphilis (Bishop) on Jul 22, 2016 at 01:43 UTC
    It is not a bug that Perl rounds the display of numeric values to a number of digits that is fewer than the number of digits of precision that the internal floating point format achieves

    It depends upon how you measure that "number of digits".
    Sure - if you allow more than 15 decimal digits then you lose the guarantee that different strings will assign different values. (It's possible that 2 different 16-digit strings will assign to the same value.)
    But there are a vast number of values that cannot be assigned if you restrict yourself to 15 decimal digits.
    To be able to assign each and every value you need 17 decimal digits, minimum.

    To me the sane thing is to have print() deliver a value that, when assigned back to a scalar, will result in the same value.

    I think the following is rather pathetic:
    C:\>perl -le "print sqrt(3);" 1.73205080756888 C:\>perl -le "print 'WTF' if 1.73205080756888 != sqrt(3);" WTF
    Why do we have to put up with that sort of anomaly when it could be eradicated by simply providing just *2* more digits of precision ?
    Is 17 decimal digits really that much more unreadable than 15 digits ?

    you get even more complaints when a computation should yield 1/100 but the result gets displayed as "0.01000000000000002"

    I don't think that's the right example.
    For me, I have to request 18 decimal digits of precision to get the trailing "2", and being surprised by that sort of result when more than 17 digits have been requested is a "user bug".
    The "correct" precision is 17 decimal digits - not 18.
    C:\>perl -le "printf '%.16e', 1/100" 1.0000000000000000e-002 C:\>perl -le "printf '%.17e', 1/100" 1.00000000000000002e-002
    A better example is:
    C:\>perl -le "printf '%.16e', 0.1" 1.0000000000000001e-001
    This happens because 0.10000000000000001 and 0.1 have the same internal floating point representation, and that representation is closer to 0.10000000000000001 than 0.1.
    I actually don't have any problem with that - but if we want to avoid such surprising results, there's a more refined way of dealing with this type of issue than simply getting the chainsaw out and hacking the tail off everything.

    I am told that python, for example, utilises this refinement.
    That is, python prints out 17 decimal digits but still prints 0.1 out as '0.1'.

    Update: Current bug report regarding this issue is here

    Cheers,
    Rob

      There is not only a single situation to consider. You present a situation where you want to preserve just slightly more precision than is preserved in that situation if you do nothing but the trivial to achieve your aims.

      C:\>perl -le "print 'WTF' if 1.00000000000000 != sqrt(3);" WTF

      Wow. If you are using '==' or '!=' on computed floating point values, then you've already lost my sympathy.

      I'm sorry, but the default, trivial way that a floating point value is displayed should not be optimized for "preserve every single bit of accuracy if you paste it back in and interpret it as a numeric value". It should be and is optimized for presenting the numeric value to humans.

      But there are a vast number of values that cannot be assigned if you restrict yourself to 15 decimal digits.

      To be clear, you can certainly assign 17 digits. Perl does not ignore beyond the 15th digit when you make an assignment (even when using a numeric literal).

      There are no shortage of easy, simple ways to preserve more accuracy if that is what you aim to do. printf and pack are the first two off the top of my head. Determining a sufficient number of digits to request from printf isn't even a difficult proposition.

      To me the sane thing is to have print() deliver a value that, when assigned back to a scalar, will result in the same value.

      To me, if you are lazily using the default string representation and just expecting 100% fidelity, then you aren't a very good programmer. To support that stance we'd have to make tabs be printed as \t and most sting values to be printed with quotes around them. Producing a representation that can preserve with perfect fidelity is simply not the purpose of print.

      During a recent period when the difference between stored accuracy and default displayed accuracy was less, we got tons of complaints because the result of code like:

      my $x = 0; for( 1..100 ) { $x += 0.01; # ... } print $x, $/;

      was not a nice "0.5" but "0.5000000000000002". Your plan would make it produce "0.50000000000000022".

      People just wanting to get a reasonable representation of a rather mundane value are who the behavior of a plain print should be catered to. People who can't stand miscounting one grain of sand on their huge beach are a much better choice for who needs to do a tiny bit of extra work.

      - tye        

        The notion that one grain of sand on a beach is not important, but one hundred are, is dubious to say the least.

        perl -le '$x += 0.1 for 1..100; print $x;'

        There are couple of points to make. First: it's not about print really. It's about stringification. Perl scalars can get pPOK from "action at a distance"; and stringification is currently lossy.

        Someone scientifically minded will of course understand the caveats of floating point, but at the same time expect roundtrip accuracy. The numbers must convert back and forth from text to internal representation. This caters to human audiences; to suggest hexdumps or similar would fly in the face of your own argument. Why would you insist people need to go out of their way to have their numbers stringify correctly?

        Frankly, this sounds like you are discouraging perl use for scientific work ("the wrong tool"). Also reminds me of the definition π = 22/7 ("correct for all practical purposes").

        Update. Uh, and a link to one of the curiosities: shortest round-tripping.

        Update2. Here's a better demonstration of the WTF equality problem.

        #! /usr/bin/perl -wl my @input = qw( 3335.9999999999995 3336.0000000000000 ); my @nums = map 0+$_, @input; sub uniq_count { int keys %{{map {$_ => 1} @_}} } print "The numbers are:"; printf " %.16e\n", $_ for @nums; printf "A set of %d different value(s)\n", uniq_count(@nums);

        And yes, this also affects Memoize, FWIW.

        If you are using '==' or '!=' on computed floating point values, then you've already lost my sympathy

        I, of course, am completely shattered at having lost your sympathy ;-)

        ... we got tons of complaints because the result of code like ... <snip> ... was not a nice "0.5" but "0.5000000000000002".

        The output of that code is *still* not nice, even today. Try it and you'll see.
        We need to get the output precision further reduced.

        Seriously ... you can reduce the decimal precision of print even further, and that sort of procedure can still produce results that are "not nice":
        perl -le "for(1..100000000) {$x += 0.01}print $x;" 1000000.00077928
        We need to reduce displayed output to no more than 10 decimal digits ?? (Even less)

        The "unniceness" of those results come from cumulatively adding a value that is not precisely 1/100.
        I don't see that it has much to do with the decimal precision of print()'s output.

        People who can't stand miscounting one grain of sand on their huge beach are a much better choice for who needs to do a tiny bit of extra work

        Yes, that's the current state of play. I hope you pointed that out to the people who complained about the output of the snippet you provided.
        And I'm sure they were quite happy to do that extra work.

        Cheers,
        Rob
        People just wanting to get a reasonable representation of a rather mundane value are who the behavior of a plain print should be catered to.

        What "people" are those I wonder? And what do they consider a "reasonable representation"?

        1. The accountant who wants pounds & pence or dollars & cents; say 2dp?
        2. The statistician for whom any more than 1dp would imply greater significance than his data or methods allow?
        3. The games programmer for whom 6dp of life force is probably more than she needs?
        4. How about the rocket scientist for whom that grain of sand on the beach represents the difference between orbital insertion and having the new crater named after them?
        5. How about the dozens that have come here every year for the last decade only to be told that if they want understand perl's confusing default output they've got to go away and read the most unhelpful piece of pretentious elitism; effectively telling them that if they want to understand Perl's default lies output they will have to become "computer scientists".

        Just who are these unnamed "people" that are happy to take this unspecified "reasonable representation", even if that means they are being lied to in a way that makes their results bewilderingly confusing.

        Just who is being served here? Since I doubt that you would need Perl to lie to you in order to avoid whatever confusion you think seeing the truth will induce; you must feel that there are other "people" who can't handle the truth!. Who are they? And how does having Perl lie to them prevent them from having to become aware of the reality of FP math?

        It should be and is optimized for presenting the numeric value to humans.

        Where your argument falls apart is that in order for the users of print to know whether they are being lied to; they need to use printf and compare the output. Any effort saved is thus negated entirely.

        Better to display the full information by default and allow (everyone) to choose how much of the truth they actually need; than to set some totally arbitrary limit on the truth that means everyone needs to revert to printf to find out whether what is being output by print is actually what they need or not.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice.
        .
      To me the sane thing is to have print() deliver a value that, when assigned back to a scalar, will result in the same value.

      I think the following is rather pathetic:

      C:\>perl -le "print sqrt(3);" 1.73205080756888 C:\>perl -le "print 'WTF' if 1.73205080756888 != sqrt(3);" WTF

      That should be expected for integer values, but for literal floating point values you would in general have to write them in a number base of a power of two if you want to express (and match) them exactly. In that sense print() cannot be of use to do the sane thing because information is already lost due to conversion to base ten (see commensurability).

        but for literal floating point values you would in general have to write them in a number base of a power of two if you want to express (and match) them exactly

        You've missed the point.

        It is not about representing root3 exactly; it is about displaying and printing the calculated internal value of root3 such that it retains all the available accuracy, so that that same level of accuracy may be restored from the printed value.

        This is always possible; but perl does not achieve it.

        The point is about reproducibility.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice.

        BrowserUK did a good job of explaining the real issue, which you now get, but I wanted to further clarify a tangential point:

        ... for literal floating point values you would in general have to write them in a number base of a power of two if you want to express ... them exactly.

        It's one of the mathematical niceties of binary that any binary floating point can be exactly represented in decimal. It comes about because 2 (binary) is a factor of 10 (decimal: 2*5)

        x = c * (2**p) # the value you want to express c # the coefficient p = -n # the smallest power of two that goes into x; # negative, since it's a binary fraction x = c*(2**p) = c*(2**-n) = ... [ 1 ] [ 1 ] [ 5**n ] [ 5**n ] = ... = c*[------] = c*[------]*[------] = c*[-------] = ... [ 2**n ] [ 2**n ] [ 5**n ] [ 10**n ] = ... = c * (5**n) * (10**-n) = c * (5**n) ** (10**p) k = c * (5**n) # k integer, so exactly representable in decimal x = k * (10**-n) # since n is finite and k is an integer, # x is exactly representable in decimal

        As might be obvious from the final statement, an n-digit binary fixed point can be expressed exactly in an n-digit decimal fixed point.

        It turns out, there's something else interesting you can tell about the number of digits for an exact power of two: specifically, which digit d (after the decimal point) that the decimal expansion will start on

        x c*(2**-n) binary decimal d log10(x) 1/2 1*(2**-1) 0b0.1 0.5 1 -0.301029996 1/4 1*(2**-2) 0b0.01 0.25 1 -0.602059991 3/4 3*(2**-2) 0b0.11 0.75 1 -0.124938737 1/8 1*(2**-3) 0b0.001 0.125 1 -0.903089987 7/8 7*(2**-3) 0b0.111 0.875 1 -0.057991947 1/16 1*(2**-4) 0b0.0001 0.0625 2 -1.204119983 15/16 15*(2**-4) 0b0.1111 0.9375 1 -0.028028724 1/32 1*(2**-5) 0b0.00001 0.03125 2 -1.505149978 1/64 1*(2**-6) 0b0.000001 0.015625 2 -1.806179974 1/128 1*(2**-7) 0b0.0000001 0.0078125 3 -2.107209970 3/128 3*(2**-7) 0b0.0000011 0.0234375 2 -1.630088715 5/128 5*(2**-7) 0b0.0000101 0.0390625 2 -1.408239965 9/128 9*(2**-7) 0b0.0001001 0.0703125 2 -1.152967460 13/128 13*(2**-7) 0b0.0001101 0.1015625 1 -0.993266617

        You might see that d = -floor(log10(x)) = ceil(-log10(x)). In general, for a given x = c*(2**-n) < 1, the decimal expression will start on the dth digit after the decimal point, and end on the nth digit after the decimal point; thus, the length of the decimal expansion (ignoring leading and trailing zeroes) is L = n - d + 1.

        For example, the smallest representable 52bit fraction x = 2**-52 = 2.2204...e-16: it will start on the 16th digit (log10(x)=-15.654 -> d=16) and run to the 52nd digit. Double precision floating-point numbers (the common native floating-point in perl5) use a 52bit fractional component, with the encoding usually meaning one plus the fractional part times some power of two (x = (1+f)*(2**p)). So, that (1+f) can be exactly expressed in decimal as

        1.0000 0000 0000 000f ffff ffff ffff ffff ffff ffff ffff ffff ffff ^1 16^ 52^ + ^ points to the nth digit after decimal point

        And thus, you need the 16 digits after the decimal point to indicate the last bit of accuracy in the underlying binary fraction, as BrowserUK said. (But you would need all 52 decimal digits after the decimal point to exactly represent the full value.)

        Sorry, I like stuff like this: all this to say: you can exactly represent any n-digit binary fraction within n decimal digits after the decimal point.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1168277]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (2)
As of 2020-10-25 17:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (249 votes). Check out past polls.

    Notices?