Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^3: Variables are automatically rounded off in perl (humans)

by syphilis (Archbishop)
on Jul 22, 2016 at 01:43 UTC ( [id://1168304]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Variables are automatically rounded off in perl (humans)
in thread Variables are automatically rounded off in perl

It is not a bug that Perl rounds the display of numeric values to a number of digits that is fewer than the number of digits of precision that the internal floating point format achieves

It depends upon how you measure that "number of digits".
Sure - if you allow more than 15 decimal digits then you lose the guarantee that different strings will assign different values. (It's possible that 2 different 16-digit strings will assign to the same value.)
But there are a vast number of values that cannot be assigned if you restrict yourself to 15 decimal digits.
To be able to assign each and every value you need 17 decimal digits, minimum.

To me the sane thing is to have print() deliver a value that, when assigned back to a scalar, will result in the same value.

I think the following is rather pathetic:
C:\>perl -le "print sqrt(3);" 1.73205080756888 C:\>perl -le "print 'WTF' if 1.73205080756888 != sqrt(3);" WTF
Why do we have to put up with that sort of anomaly when it could be eradicated by simply providing just *2* more digits of precision ?
Is 17 decimal digits really that much more unreadable than 15 digits ?

you get even more complaints when a computation should yield 1/100 but the result gets displayed as "0.01000000000000002"

I don't think that's the right example.
For me, I have to request 18 decimal digits of precision to get the trailing "2", and being surprised by that sort of result when more than 17 digits have been requested is a "user bug".
The "correct" precision is 17 decimal digits - not 18.
C:\>perl -le "printf '%.16e', 1/100" 1.0000000000000000e-002 C:\>perl -le "printf '%.17e', 1/100" 1.00000000000000002e-002
A better example is:
C:\>perl -le "printf '%.16e', 0.1" 1.0000000000000001e-001
This happens because 0.10000000000000001 and 0.1 have the same internal floating point representation, and that representation is closer to 0.10000000000000001 than 0.1.
I actually don't have any problem with that - but if we want to avoid such surprising results, there's a more refined way of dealing with this type of issue than simply getting the chainsaw out and hacking the tail off everything.

I am told that python, for example, utilises this refinement.
That is, python prints out 17 decimal digits but still prints 0.1 out as '0.1'.

Update: Current bug report regarding this issue is here

Cheers,
Rob

Replies are listed 'Best First'.
Re^4: Variables are automatically rounded off in perl (audiences)
by tye (Sage) on Jul 23, 2016 at 01:25 UTC

    There is not only a single situation to consider. You present a situation where you want to preserve just slightly more precision than is preserved in that situation if you do nothing but the trivial to achieve your aims.

    C:\>perl -le "print 'WTF' if 1.00000000000000 != sqrt(3);" WTF

    Wow. If you are using '==' or '!=' on computed floating point values, then you've already lost my sympathy.

    I'm sorry, but the default, trivial way that a floating point value is displayed should not be optimized for "preserve every single bit of accuracy if you paste it back in and interpret it as a numeric value". It should be and is optimized for presenting the numeric value to humans.

    But there are a vast number of values that cannot be assigned if you restrict yourself to 15 decimal digits.

    To be clear, you can certainly assign 17 digits. Perl does not ignore beyond the 15th digit when you make an assignment (even when using a numeric literal).

    There are no shortage of easy, simple ways to preserve more accuracy if that is what you aim to do. printf and pack are the first two off the top of my head. Determining a sufficient number of digits to request from printf isn't even a difficult proposition.

    To me the sane thing is to have print() deliver a value that, when assigned back to a scalar, will result in the same value.

    To me, if you are lazily using the default string representation and just expecting 100% fidelity, then you aren't a very good programmer. To support that stance we'd have to make tabs be printed as \t and most sting values to be printed with quotes around them. Producing a representation that can preserve with perfect fidelity is simply not the purpose of print.

    During a recent period when the difference between stored accuracy and default displayed accuracy was less, we got tons of complaints because the result of code like:

    my $x = 0; for( 1..100 ) { $x += 0.01; # ... } print $x, $/;

    was not a nice "0.5" but "0.5000000000000002". Your plan would make it produce "0.50000000000000022".

    People just wanting to get a reasonable representation of a rather mundane value are who the behavior of a plain print should be catered to. People who can't stand miscounting one grain of sand on their huge beach are a much better choice for who needs to do a tiny bit of extra work.

    - tye        

      The notion that one grain of sand on a beach is not important, but one hundred are, is dubious to say the least.

      perl -le '$x += 0.1 for 1..100; print $x;'

      There are couple of points to make. First: it's not about print really. It's about stringification. Perl scalars can get pPOK from "action at a distance"; and stringification is currently lossy.

      Someone scientifically minded will of course understand the caveats of floating point, but at the same time expect roundtrip accuracy. The numbers must convert back and forth from text to internal representation. This caters to human audiences; to suggest hexdumps or similar would fly in the face of your own argument. Why would you insist people need to go out of their way to have their numbers stringify correctly?

      Frankly, this sounds like you are discouraging perl use for scientific work ("the wrong tool"). Also reminds me of the definition π = 22/7 ("correct for all practical purposes").

      Update. Uh, and a link to one of the curiosities: shortest round-tripping.

      Update2. Here's a better demonstration of the WTF equality problem.

      #! /usr/bin/perl -wl my @input = qw( 3335.9999999999995 3336.0000000000000 ); my @nums = map 0+$_, @input; sub uniq_count { int keys %{{map {$_ => 1} @_}} } print "The numbers are:"; printf " %.16e\n", $_ for @nums; printf "A set of %d different value(s)\n", uniq_count(@nums);

      And yes, this also affects Memoize, FWIW.

        The notion that one grain of sand on a beach is not important, but one hundred are, is dubious to say the least.

        My argument was actually that 1 grain of sand on a huge beach is almost never important. I made no statement about how many grains of sand result in what other level of importance. And I don't hear you refuting the claim that one grain of sand on the beach is almost never important (and so people dealing with such situations should expect to have to do a tiny bit of extra work). And I doubt anybody will argue with an assertion that the first significant bit is almost always important. There is, of course, no clear "line" where the bits on one side of the line are clearly important while those on the other side are clearly not important. Yet, in a 'double', we have at least 1 bit that almost always matters and at least 1 bit that almost never matters.

        First: it's not about print really. It's about stringification.

        Sure, I was talking about the default stringification which I sometimes referred to as 'print' as a shortcut.

        Perl scalars can get pPOK from "action at a distance"; and stringification is currently lossy.

        Certainly stringification can happen for subtle reasons. You seem to be implying that you can get a loss of precision due to "action at a distance". No, stringification is not lossy in that way. What is lossy is taking the default stringification and then converting that back to a number. A Perl scalar getting stringified does not cause any loss of precision in that scalar (the original numeric value remains along side the stringification).

        Frankly, this sounds like you are discouraging perl use for scientific work

        Perhaps you jumped to that conclusion because you thought that Perl's default stringification could cause numeric values to lose precision due to "action at a distance"? I certainly was not arguing that Perl is inappropriate for scientific work. I was making the point that just pasting the default stringification output from one set of calculations as input to another set of calculations can be inappropriate in scientific work. But then, my experience is that scientists are aware of this. Though, most scientists are calculating how many significant bits they can claim from their calculations and those are almost always quite a bit fewer than 15 anyway (even 15 digits of accuracy in the measurements going in to the calculations is almost unheard of in science in my experience).

        Someone scientifically minded will of course understand the caveats of floating point, but at the same time expect roundtrip accuracy.

        What?! You think scientist are prone to take digit strings output from one calculation and paste them in for further calculations and not realize that some precision is lost? My experience is the opposite of that. Though, my experience is also that 15 digits of precision is so far above the significant digits in most scientific calculations that "15 vs 17 significant digits" is something that will often be ignored by a scientist.

        - tye        

      If you are using '==' or '!=' on computed floating point values, then you've already lost my sympathy

      I, of course, am completely shattered at having lost your sympathy ;-)

      ... we got tons of complaints because the result of code like ... <snip> ... was not a nice "0.5" but "0.5000000000000002".

      The output of that code is *still* not nice, even today. Try it and you'll see.
      We need to get the output precision further reduced.

      Seriously ... you can reduce the decimal precision of print even further, and that sort of procedure can still produce results that are "not nice":
      perl -le "for(1..100000000) {$x += 0.01}print $x;" 1000000.00077928
      We need to reduce displayed output to no more than 10 decimal digits ?? (Even less)

      The "unniceness" of those results come from cumulatively adding a value that is not precisely 1/100.
      I don't see that it has much to do with the decimal precision of print()'s output.

      People who can't stand miscounting one grain of sand on their huge beach are a much better choice for who needs to do a tiny bit of extra work

      Yes, that's the current state of play. I hope you pointed that out to the people who complained about the output of the snippet you provided.
      And I'm sure they were quite happy to do that extra work.

      Cheers,
      Rob
        The output of that code is *still* not nice, even today. Try it and you'll see.

        I ran that code in multiple versions of Perl that I had handy. The output was the intuitive ("nice") version, even when I ran the code with twice as many iterations.

        Note that your stance would produce 16 digits of noise for even trivial cases like print 0.1+0.2.

        Yes, there is no one obvious best value for how many digits to show by default. C says 6, for example. So, you give a "slippery slope" argument, dodge one argument by concentrating on tangential aspects of the phrasing, and make a false assumption about whether I ran the code. Not much to refute here.

        - tye        

      People just wanting to get a reasonable representation of a rather mundane value are who the behavior of a plain print should be catered to.

      What "people" are those I wonder? And what do they consider a "reasonable representation"?

      1. The accountant who wants pounds & pence or dollars & cents; say 2dp?
      2. The statistician for whom any more than 1dp would imply greater significance than his data or methods allow?
      3. The games programmer for whom 6dp of life force is probably more than she needs?
      4. How about the rocket scientist for whom that grain of sand on the beach represents the difference between orbital insertion and having the new crater named after them?
      5. How about the dozens that have come here every year for the last decade only to be told that if they want understand perl's confusing default output they've got to go away and read the most unhelpful piece of pretentious elitism; effectively telling them that if they want to understand Perl's default lies output they will have to become "computer scientists".

      Just who are these unnamed "people" that are happy to take this unspecified "reasonable representation", even if that means they are being lied to in a way that makes their results bewilderingly confusing.

      Just who is being served here? Since I doubt that you would need Perl to lie to you in order to avoid whatever confusion you think seeing the truth will induce; you must feel that there are other "people" who can't handle the truth!. Who are they? And how does having Perl lie to them prevent them from having to become aware of the reality of FP math?

      It should be and is optimized for presenting the numeric value to humans.

      Where your argument falls apart is that in order for the users of print to know whether they are being lied to; they need to use printf and compare the output. Any effort saved is thus negated entirely.

      Better to display the full information by default and allow (everyone) to choose how much of the truth they actually need; than to set some totally arbitrary limit on the truth that means everyone needs to revert to printf to find out whether what is being output by print is actually what they need or not.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice.
      .

        Yes, you list 3 roles where somebody should specify how many digits that they want output.

        How about the rocket scientist for whom that grain of sand on the beach represents the difference between orbital insertion and having the new crater named after them?

        So, if a rocket scientist is taking the default stringification from one set of complex calculations that require extreme precision and then just pasting those into some other set of complex calculations that require extreme precision and is oblivious to the loss of precision that resulted, then they aren't very good at their job. I've actually worked for rocket scientists (doing numerical calculations for them -- though, I was dealing with trying to predict cracks in rocket fuel, not orbital mechanics).

        So, when sending a craft to some distant part of space, the things you measure and control are the weight of the craft and payload, the force provided by the various engines, the duration of a burn, the orientation of the craft when the burn happens. None of those are things where you have anything close to even 15 digits of accuracy.

        Calculations in orbital mechanics involve a lot of numerical methods for finding solutions. Those repeat a calculation over and over, adjusting values, searching for the solution, stopping when the accuracy gets close enough. Such techniques are extremely unlikely to work if you are wanting 15 digits of accuracy and are using the 53-bit mantissa of a standard 'double' (especially when the calculations are as complicated as those of orbital mechanics). So I won't take your word that rounding some intermediate value in the middle of some rocket science calculation would be a fatal flaw.

        How about the dozens that have come here every year for the last decade only to be told that if they want understand perl's confusing default output they've got to go away and read

        I've only seen one person complain about Perl's default output being truncated. I've seen several people complain about Perl's default output not being as truncated as they expected. I've seen several people complain about how the use of '==' on floating point values is so exacting as to be nearly useless. I've seen several people note that using 'eq' instead of '==' "fixes" the reported problem, well, until you get your way, of course.

        So, your stance solves 1 person's problem. It exacerbates the problem for the several people in the second group. It removes an easy work-around for the third problem.

        Well, "print 0.1" producing "0.1" is also producing a lie. The value is not actually 0.1. So, having "print 0.1" actually produce "0.10000000000000001" (which is also a lie, just less of one) would force people to understand the limits of floating point even sooner. Despite that benefit, I don't believe it would be the best trade-off for Perl.

        Your stance would have something as trivial as "print 0.1+0.2" produce 17 digits of noise.

        Just who are these unnamed "people"

        They are similar to the unamed people you claim come to PerlMonks complaining about the default output precision. Like the one actual case of such that I've seen (the starter of this thread), they usually don't say much of anything about what they were trying to do. We have nearly no clue (at this time) how that author got a value of 3335.9999999999995 nor why they think they need that value reported to 17 digits of accuracy (but don't want to specify that requirement). I'd actually like to know some of that back story. It might even convince me that I'm wrong.

        But I can share an example of a recent case where I was happy to be ever so slightly (more) lied to by default.

        I measure various types of durations using wrappers around high-resolution timers where the underlying details of the timer vary between platforms. But it turns out that all of the platforms I've done this on have high-resolution timers that measure in some negative power of 10 of seconds. They don't all agree on the power. So some systems give me durations in milliseconds. Some in microseconds. Some in 10s or 100s of microseconds. Some that I haven't used recently actually only gave me hundredths of seconds. So, when I add up some small number of durations (or subtract a few high-resolution times), I am thankful to get told only a few digits after the decimal point when I report the total number of seconds.

        Actually figuring out how many digits to ask for in order to get this behavior even if I were to add up thousands of such values is not trivial.

        So, even though these systems are all binary systems, it turns out that, since they were designed by humans, they have a strong bias toward powers of 10.

        It is just extremely common for people to deal with a reasonably small number of values each of which has a reasonably small number of digits after each decimal point. It is nice to let such common (often informal) data sets to be manipulated using mundane addition and subtraction and get results that look as expected. In such situations, the person is not happy to suddenly have a dozen or so '0's or '9's show up on the end of some of their output values.

        - tye        

Re^4: Variables are automatically rounded off in perl (humans)
by hexcoder (Curate) on Jul 22, 2016 at 07:53 UTC
    To me the sane thing is to have print() deliver a value that, when assigned back to a scalar, will result in the same value.

    I think the following is rather pathetic:

    C:\>perl -le "print sqrt(3);" 1.73205080756888 C:\>perl -le "print 'WTF' if 1.73205080756888 != sqrt(3);" WTF

    That should be expected for integer values, but for literal floating point values you would in general have to write them in a number base of a power of two if you want to express (and match) them exactly. In that sense print() cannot be of use to do the sane thing because information is already lost due to conversion to base ten (see commensurability).

      but for literal floating point values you would in general have to write them in a number base of a power of two if you want to express (and match) them exactly

      You've missed the point.

      It is not about representing root3 exactly; it is about displaying and printing the calculated internal value of root3 such that it retains all the available accuracy, so that that same level of accuracy may be restored from the printed value.

      This is always possible; but perl does not achieve it.

      The point is about reproducibility.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice.
        This is always possible...

        I agree (now). Sorry, that was my fault. Decimal values like 0.2 cannot be represented exactly with finite precision in number base two, but it works certainly the other way round since all fractions of powers of two don't have repeating decimals.

      BrowserUK did a good job of explaining the real issue, which you now get, but I wanted to further clarify a tangential point:

      ... for literal floating point values you would in general have to write them in a number base of a power of two if you want to express ... them exactly.

      It's one of the mathematical niceties of binary that any binary floating point can be exactly represented in decimal. It comes about because 2 (binary) is a factor of 10 (decimal: 2*5)

      x = c * (2**p) # the value you want to express c # the coefficient p = -n # the smallest power of two that goes into x; # negative, since it's a binary fraction x = c*(2**p) = c*(2**-n) = ... [ 1 ] [ 1 ] [ 5**n ] [ 5**n ] = ... = c*[------] = c*[------]*[------] = c*[-------] = ... [ 2**n ] [ 2**n ] [ 5**n ] [ 10**n ] = ... = c * (5**n) * (10**-n) = c * (5**n) ** (10**p) k = c * (5**n) # k integer, so exactly representable in decimal x = k * (10**-n) # since n is finite and k is an integer, # x is exactly representable in decimal

      As might be obvious from the final statement, an n-digit binary fixed point can be expressed exactly in an n-digit decimal fixed point.

      It turns out, there's something else interesting you can tell about the number of digits for an exact power of two: specifically, which digit d (after the decimal point) that the decimal expansion will start on

      x c*(2**-n) binary decimal d log10(x) 1/2 1*(2**-1) 0b0.1 0.5 1 -0.301029996 1/4 1*(2**-2) 0b0.01 0.25 1 -0.602059991 3/4 3*(2**-2) 0b0.11 0.75 1 -0.124938737 1/8 1*(2**-3) 0b0.001 0.125 1 -0.903089987 7/8 7*(2**-3) 0b0.111 0.875 1 -0.057991947 1/16 1*(2**-4) 0b0.0001 0.0625 2 -1.204119983 15/16 15*(2**-4) 0b0.1111 0.9375 1 -0.028028724 1/32 1*(2**-5) 0b0.00001 0.03125 2 -1.505149978 1/64 1*(2**-6) 0b0.000001 0.015625 2 -1.806179974 1/128 1*(2**-7) 0b0.0000001 0.0078125 3 -2.107209970 3/128 3*(2**-7) 0b0.0000011 0.0234375 2 -1.630088715 5/128 5*(2**-7) 0b0.0000101 0.0390625 2 -1.408239965 9/128 9*(2**-7) 0b0.0001001 0.0703125 2 -1.152967460 13/128 13*(2**-7) 0b0.0001101 0.1015625 1 -0.993266617

      You might see that d = -floor(log10(x)) = ceil(-log10(x)). In general, for a given x = c*(2**-n) < 1, the decimal expression will start on the dth digit after the decimal point, and end on the nth digit after the decimal point; thus, the length of the decimal expansion (ignoring leading and trailing zeroes) is L = n - d + 1.

      For example, the smallest representable 52bit fraction x = 2**-52 = 2.2204...e-16: it will start on the 16th digit (log10(x)=-15.654 -> d=16) and run to the 52nd digit. Double precision floating-point numbers (the common native floating-point in perl5) use a 52bit fractional component, with the encoding usually meaning one plus the fractional part times some power of two (x = (1+f)*(2**p)). So, that (1+f) can be exactly expressed in decimal as

      1.0000 0000 0000 000f ffff ffff ffff ffff ffff ffff ffff ffff ffff ^1 16^ 52^ + ^ points to the nth digit after decimal point

      And thus, you need the 16 digits after the decimal point to indicate the last bit of accuracy in the underlying binary fraction, as BrowserUK said. (But you would need all 52 decimal digits after the decimal point to exactly represent the full value.)

      Sorry, I like stuff like this: all this to say: you can exactly represent any n-digit binary fraction within n decimal digits after the decimal point.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1168304]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-04-25 17:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found