http://qs321.pair.com?node_id=11118705


in reply to Re^5: Reliably parsing an integer
in thread Reliably parsing an integer

Your solution is interesting, thanks.

The trouble is, I do not really trust the "int $num eq $num;" expression. I have seen that 'int' can do weird things, and I am not convinced that there is no floating-point corner case somewhere on some hardware that spits out something that looks like an integer, because some inaccuracy cancels out when generating a string from it.

Besides, the "eq" is actually converting the integer back to a string, which will cost performance. I think I can write faster code with the steps I outlined above. Stay tuned.

Replies are listed 'Best First'.
Re^7: Reliably parsing an integer
by haukex (Archbishop) on Jun 30, 2020 at 09:11 UTC
    I have seen that 'int' can do weird things, and I am not convinced that there is no floating-point corner case somewhere on some hardware that spits out something that looks like an integer, because some inaccuracy cancels out when generating a string from it.

    Of course, if you could show an example of this, that'd be much better than a vauge worry. int is documented to return an integer, and hippo's valid_int includes a check that the input is not floating-point.

    If you're worried about a cutoff happening at 9,007,199,254,740,991 instead of 9,007,199,254,740,992, both of which are over nine quadrillion, then I suggest you're worrying about the wrong thing: since you're saying you want this to be portable to different machines and different Perls, this cutoff is arbitrary anyway!

    If you want a precise cutoff, you should choose a specific one that you are pretty certain will work on all expected architectures, like say 2**31-1, and if you wanted to code super defensively, you can even compare this cutoff to the integer limits I linked to and report an error otherwise.

    Besides, the "eq" is actually converting the integer back to a string, which will cost performance.

    hippo beat me to it: Perl's internal conversions are very fast, I suspect it'll be negligible. But let's not guess - Benchmark!

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re^7: Reliably parsing an integer
by hippo (Bishop) on Jun 30, 2020 at 08:55 UTC
    Besides, the "eq" is actually converting the integer back to a string, which will cost performance.
    #!/usr/bin/env perl use strict; use warnings; use Benchmark 'timethis'; timethis (10_000_000, "valid_int ('18446744073709551614')"); sub valid_int { my $num = shift; return unless $num =~ /^\d+$/a; return int $num eq $num; } __END__ timethis 10000000: 8 wallclock secs ( 7.93 usr + 0.00 sys = 7.93 CP +U) @ 1261034.05/s (n=10000000)

    So, less than a microsecond on my aging system. Doubtless this can be improved upon (although about half the time taken appears to be the regex so there's a limit there too).

        Can you see any bugs?

        In perl, strings like "2e5" and "1E4" are valid integer strings.
        Are they being handled as you would like ? (I haven't investigated, and it depends upon how you want them to be treated.)

        If you want to accept them then you also have to keep in mind that if they represent values greater than ~0 or less than -(~0 >> 1) then they'll be assigned to NVs (floating point values) rather than IVs (integer value).

        I guess it's probably simplest if you reject them.

        Update, correcting an earlier incorrect update: After I wrote this post I discovered that if you assign a value like 2e5 as a bareword, you get an NV, not an IV - and I therefore thought I had made a mistake in asserting that such strings were "integer strings".
        However, although the barewords often assign as an NV, the above assertions seem to be correct in relation to numifying strings.
        It seems a bit random.
        We get an NV with:
        C:\>perl -MDevel::Peek -le "$x = 1e6; Dump $x;" SV = NV(0x488470) at 0x488488 REFCNT = 1 FLAGS = (NOK,pNOK) NV = 1000000
        but an IV if we perform some arithmetic function:
        C:\>perl -MDevel::Peek -le "$x = 1e6 * 1.0; Dump $x;" SV = IV(0x4ecb10) at 0x4ecb20 REFCNT = 1 FLAGS = (IOK,pIOK) IV = 1000000
        and, as regards what I've written above, if we numify the string we still get an IV:
        C:\>perl -MDevel::Peek -le "$x = '1e6' * 1.0; Dump $x;" SV = IV(0x33d010) at 0x33d020 REFCNT = 1 FLAGS = (IOK,pIOK) IV = 1000000
        unless the value represented by the string is outside of the IV range:
        C:\>perl -MDevel::Peek -le "$x = '1e70' * 1.0; Dump $x;" SV = NV(0x576440) at 0x576458 REFCNT = 1 FLAGS = (NOK,pNOK) NV = 1e+70

        Cheers,
        Rob
        A reply falls below the community's threshold of quality. You may see it by logging in.