Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^4: Variables are automatically rounded off in perl (audiences)

by tye (Sage)
on Jul 23, 2016 at 01:25 UTC ( #1168396=note: print w/replies, xml ) Need Help??


in reply to Re^3: Variables are automatically rounded off in perl (humans)
in thread Variables are automatically rounded off in perl

There is not only a single situation to consider. You present a situation where you want to preserve just slightly more precision than is preserved in that situation if you do nothing but the trivial to achieve your aims.

C:\>perl -le "print 'WTF' if 1.00000000000000 != sqrt(3);" WTF

Wow. If you are using '==' or '!=' on computed floating point values, then you've already lost my sympathy.

I'm sorry, but the default, trivial way that a floating point value is displayed should not be optimized for "preserve every single bit of accuracy if you paste it back in and interpret it as a numeric value". It should be and is optimized for presenting the numeric value to humans.

But there are a vast number of values that cannot be assigned if you restrict yourself to 15 decimal digits.

To be clear, you can certainly assign 17 digits. Perl does not ignore beyond the 15th digit when you make an assignment (even when using a numeric literal).

There are no shortage of easy, simple ways to preserve more accuracy if that is what you aim to do. printf and pack are the first two off the top of my head. Determining a sufficient number of digits to request from printf isn't even a difficult proposition.

To me the sane thing is to have print() deliver a value that, when assigned back to a scalar, will result in the same value.

To me, if you are lazily using the default string representation and just expecting 100% fidelity, then you aren't a very good programmer. To support that stance we'd have to make tabs be printed as \t and most sting values to be printed with quotes around them. Producing a representation that can preserve with perfect fidelity is simply not the purpose of print.

During a recent period when the difference between stored accuracy and default displayed accuracy was less, we got tons of complaints because the result of code like:

my $x = 0; for( 1..100 ) { $x += 0.01; # ... } print $x, $/;

was not a nice "0.5" but "0.5000000000000002". Your plan would make it produce "0.50000000000000022".

People just wanting to get a reasonable representation of a rather mundane value are who the behavior of a plain print should be catered to. People who can't stand miscounting one grain of sand on their huge beach are a much better choice for who needs to do a tiny bit of extra work.

- tye        

Replies are listed 'Best First'.
Re^5: Variables are automatically rounded off in perl (audiences)
by oiskuu (Hermit) on Jul 23, 2016 at 14:38 UTC

    The notion that one grain of sand on a beach is not important, but one hundred are, is dubious to say the least.

    perl -le '$x += 0.1 for 1..100; print $x;'

    There are couple of points to make. First: it's not about print really. It's about stringification. Perl scalars can get pPOK from "action at a distance"; and stringification is currently lossy.

    Someone scientifically minded will of course understand the caveats of floating point, but at the same time expect roundtrip accuracy. The numbers must convert back and forth from text to internal representation. This caters to human audiences; to suggest hexdumps or similar would fly in the face of your own argument. Why would you insist people need to go out of their way to have their numbers stringify correctly?

    Frankly, this sounds like you are discouraging perl use for scientific work ("the wrong tool"). Also reminds me of the definition π = 22/7 ("correct for all practical purposes").

    Update. Uh, and a link to one of the curiosities: shortest round-tripping.

    Update2. Here's a better demonstration of the WTF equality problem.

    #! /usr/bin/perl -wl my @input = qw( 3335.9999999999995 3336.0000000000000 ); my @nums = map 0+$_, @input; sub uniq_count { int keys %{{map {$_ => 1} @_}} } print "The numbers are:"; printf " %.16e\n", $_ for @nums; printf "A set of %d different value(s)\n", uniq_count(@nums);

    And yes, this also affects Memoize, FWIW.

      The notion that one grain of sand on a beach is not important, but one hundred are, is dubious to say the least.

      My argument was actually that 1 grain of sand on a huge beach is almost never important. I made no statement about how many grains of sand result in what other level of importance. And I don't hear you refuting the claim that one grain of sand on the beach is almost never important (and so people dealing with such situations should expect to have to do a tiny bit of extra work). And I doubt anybody will argue with an assertion that the first significant bit is almost always important. There is, of course, no clear "line" where the bits on one side of the line are clearly important while those on the other side are clearly not important. Yet, in a 'double', we have at least 1 bit that almost always matters and at least 1 bit that almost never matters.

      First: it's not about print really. It's about stringification.

      Sure, I was talking about the default stringification which I sometimes referred to as 'print' as a shortcut.

      Perl scalars can get pPOK from "action at a distance"; and stringification is currently lossy.

      Certainly stringification can happen for subtle reasons. You seem to be implying that you can get a loss of precision due to "action at a distance". No, stringification is not lossy in that way. What is lossy is taking the default stringification and then converting that back to a number. A Perl scalar getting stringified does not cause any loss of precision in that scalar (the original numeric value remains along side the stringification).

      Frankly, this sounds like you are discouraging perl use for scientific work

      Perhaps you jumped to that conclusion because you thought that Perl's default stringification could cause numeric values to lose precision due to "action at a distance"? I certainly was not arguing that Perl is inappropriate for scientific work. I was making the point that just pasting the default stringification output from one set of calculations as input to another set of calculations can be inappropriate in scientific work. But then, my experience is that scientists are aware of this. Though, most scientists are calculating how many significant bits they can claim from their calculations and those are almost always quite a bit fewer than 15 anyway (even 15 digits of accuracy in the measurements going in to the calculations is almost unheard of in science in my experience).

      Someone scientifically minded will of course understand the caveats of floating point, but at the same time expect roundtrip accuracy.

      What?! You think scientist are prone to take digit strings output from one calculation and paste them in for further calculations and not realize that some precision is lost? My experience is the opposite of that. Though, my experience is also that 15 digits of precision is so far above the significant digits in most scientific calculations that "15 vs 17 significant digits" is something that will often be ignored by a scientist.

      - tye        

        Re: pPOK. I was thinking about things like this, though that's probably a bug in the module. In any case, bad design and bugs compound to make one hellish landscape. It's the difference between "things just work" and "things just don't work".

        Re: one grain of sand. I think I made it clear in the update that one-to-one mapping i.e. identity is important, even if the magnitude isn't. I don't hear you refuting that.

        Basically, there are two desirable properties to have:

        1. exact calculations, free of accumulating errors ("grains of sand"). Perl NV aka double cannot guarantee that, period. Feed complainers => -Mbigrat.
        2. round-tripping conversions. That can be guaranteed! And should. "0.1" -> 0.1 -> "0.1" is about shortest roundtrip, no truncation is necessary.

        Finally, Re: cut and paste computing. Absolutely! Take a printed paper and run those numbers. Repeatability is the cornerstone. If you say you run statistics on IEEE doubles, but your data does not compute, someone will be upset.

Re^5: Variables are automatically rounded off in perl (audiences)
by syphilis (Bishop) on Jul 23, 2016 at 04:34 UTC
    If you are using '==' or '!=' on computed floating point values, then you've already lost my sympathy

    I, of course, am completely shattered at having lost your sympathy ;-)

    ... we got tons of complaints because the result of code like ... <snip> ... was not a nice "0.5" but "0.5000000000000002".

    The output of that code is *still* not nice, even today. Try it and you'll see.
    We need to get the output precision further reduced.

    Seriously ... you can reduce the decimal precision of print even further, and that sort of procedure can still produce results that are "not nice":
    perl -le "for(1..100000000) {$x += 0.01}print $x;" 1000000.00077928
    We need to reduce displayed output to no more than 10 decimal digits ?? (Even less)

    The "unniceness" of those results come from cumulatively adding a value that is not precisely 1/100.
    I don't see that it has much to do with the decimal precision of print()'s output.

    People who can't stand miscounting one grain of sand on their huge beach are a much better choice for who needs to do a tiny bit of extra work

    Yes, that's the current state of play. I hope you pointed that out to the people who complained about the output of the snippet you provided.
    And I'm sure they were quite happy to do that extra work.

    Cheers,
    Rob
      The output of that code is *still* not nice, even today. Try it and you'll see.

      I ran that code in multiple versions of Perl that I had handy. The output was the intuitive ("nice") version, even when I ran the code with twice as many iterations.

      Note that your stance would produce 16 digits of noise for even trivial cases like print 0.1+0.2.

      Yes, there is no one obvious best value for how many digits to show by default. C says 6, for example. So, you give a "slippery slope" argument, dodge one argument by concentrating on tangential aspects of the phrasing, and make a false assumption about whether I ran the code. Not much to refute here.

      - tye        

        The output was the intuitive ("nice") version

        Did you look at the intermediate values ? (That's what your original complainants were doing.)
        C:\_32\pscrpt>perl -le "print $];" 5.024000 C:\_32\pscrpt>perl -V:archname archname='MSWin32-x86-multi-thread-64int'; C:\_32\pscrpt>perl -V:nvtype nvtype='double'; C:\_32\pscrpt>type try.pl use warnings; use strict; my $x = 0; for( 1..100 ) { $x += 0.01; print "$x\n" if length $x > 4; } C:\_32\pscrpt>perl try.pl 0.810000000000001 0.820000000000001 0.830000000000001 0.840000000000001 0.850000000000001 0.860000000000001 0.870000000000001 0.880000000000001 0.890000000000001 0.900000000000001 0.910000000000001 0.920000000000001 0.930000000000001 0.940000000000001 0.950000000000001 0.960000000000001 0.970000000000001 0.980000000000001 0.990000000000001 C:\_32\pscrpt>
        Those are actually correct values (rounded to 15 decimal digits of precision) for a perl whose nvtype is an 8-byte double.

        And it's also to be expected (given perl's current practice) that the next (and last) value to be calculated is printed as "1" - because 1.0000000000000007 rounded to 15 decimal digits of precision is exactly that.

        Cheers,
        Rob
Re^5: Variables are automatically rounded off in perl (audiences)
by BrowserUk (Pope) on Jul 23, 2016 at 19:53 UTC
    People just wanting to get a reasonable representation of a rather mundane value are who the behavior of a plain print should be catered to.

    What "people" are those I wonder? And what do they consider a "reasonable representation"?

    1. The accountant who wants pounds & pence or dollars & cents; say 2dp?
    2. The statistician for whom any more than 1dp would imply greater significance than his data or methods allow?
    3. The games programmer for whom 6dp of life force is probably more than she needs?
    4. How about the rocket scientist for whom that grain of sand on the beach represents the difference between orbital insertion and having the new crater named after them?
    5. How about the dozens that have come here every year for the last decade only to be told that if they want understand perl's confusing default output they've got to go away and read the most unhelpful piece of pretentious elitism; effectively telling them that if they want to understand Perl's default lies output they will have to become "computer scientists".

    Just who are these unnamed "people" that are happy to take this unspecified "reasonable representation", even if that means they are being lied to in a way that makes their results bewilderingly confusing.

    Just who is being served here? Since I doubt that you would need Perl to lie to you in order to avoid whatever confusion you think seeing the truth will induce; you must feel that there are other "people" who can't handle the truth!. Who are they? And how does having Perl lie to them prevent them from having to become aware of the reality of FP math?

    It should be and is optimized for presenting the numeric value to humans.

    Where your argument falls apart is that in order for the users of print to know whether they are being lied to; they need to use printf and compare the output. Any effort saved is thus negated entirely.

    Better to display the full information by default and allow (everyone) to choose how much of the truth they actually need; than to set some totally arbitrary limit on the truth that means everyone needs to revert to printf to find out whether what is being output by print is actually what they need or not.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
    .

      Yes, you list 3 roles where somebody should specify how many digits that they want output.

      How about the rocket scientist for whom that grain of sand on the beach represents the difference between orbital insertion and having the new crater named after them?

      So, if a rocket scientist is taking the default stringification from one set of complex calculations that require extreme precision and then just pasting those into some other set of complex calculations that require extreme precision and is oblivious to the loss of precision that resulted, then they aren't very good at their job. I've actually worked for rocket scientists (doing numerical calculations for them -- though, I was dealing with trying to predict cracks in rocket fuel, not orbital mechanics).

      So, when sending a craft to some distant part of space, the things you measure and control are the weight of the craft and payload, the force provided by the various engines, the duration of a burn, the orientation of the craft when the burn happens. None of those are things where you have anything close to even 15 digits of accuracy.

      Calculations in orbital mechanics involve a lot of numerical methods for finding solutions. Those repeat a calculation over and over, adjusting values, searching for the solution, stopping when the accuracy gets close enough. Such techniques are extremely unlikely to work if you are wanting 15 digits of accuracy and are using the 53-bit mantissa of a standard 'double' (especially when the calculations are as complicated as those of orbital mechanics). So I won't take your word that rounding some intermediate value in the middle of some rocket science calculation would be a fatal flaw.

      How about the dozens that have come here every year for the last decade only to be told that if they want understand perl's confusing default output they've got to go away and read

      I've only seen one person complain about Perl's default output being truncated. I've seen several people complain about Perl's default output not being as truncated as they expected. I've seen several people complain about how the use of '==' on floating point values is so exacting as to be nearly useless. I've seen several people note that using 'eq' instead of '==' "fixes" the reported problem, well, until you get your way, of course.

      So, your stance solves 1 person's problem. It exacerbates the problem for the several people in the second group. It removes an easy work-around for the third problem.

      Well, "print 0.1" producing "0.1" is also producing a lie. The value is not actually 0.1. So, having "print 0.1" actually produce "0.10000000000000001" (which is also a lie, just less of one) would force people to understand the limits of floating point even sooner. Despite that benefit, I don't believe it would be the best trade-off for Perl.

      Your stance would have something as trivial as "print 0.1+0.2" produce 17 digits of noise.

      Just who are these unnamed "people"

      They are similar to the unamed people you claim come to PerlMonks complaining about the default output precision. Like the one actual case of such that I've seen (the starter of this thread), they usually don't say much of anything about what they were trying to do. We have nearly no clue (at this time) how that author got a value of 3335.9999999999995 nor why they think they need that value reported to 17 digits of accuracy (but don't want to specify that requirement). I'd actually like to know some of that back story. It might even convince me that I'm wrong.

      But I can share an example of a recent case where I was happy to be ever so slightly (more) lied to by default.

      I measure various types of durations using wrappers around high-resolution timers where the underlying details of the timer vary between platforms. But it turns out that all of the platforms I've done this on have high-resolution timers that measure in some negative power of 10 of seconds. They don't all agree on the power. So some systems give me durations in milliseconds. Some in microseconds. Some in 10s or 100s of microseconds. Some that I haven't used recently actually only gave me hundredths of seconds. So, when I add up some small number of durations (or subtract a few high-resolution times), I am thankful to get told only a few digits after the decimal point when I report the total number of seconds.

      Actually figuring out how many digits to ask for in order to get this behavior even if I were to add up thousands of such values is not trivial.

      So, even though these systems are all binary systems, it turns out that, since they were designed by humans, they have a strong bias toward powers of 10.

      It is just extremely common for people to deal with a reasonably small number of values each of which has a reasonably small number of digits after each decimal point. It is nice to let such common (often informal) data sets to be manipulated using mundane addition and subtraction and get results that look as expected. In such situations, the person is not happy to suddenly have a dozen or so '0's or '9's show up on the end of some of their output values.

      - tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1168396]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2020-10-24 03:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (242 votes). Check out past polls.

    Notices?