Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^3: Reliably parsing an integer

by AnomalousMonk (Bishop)
on Jun 29, 2020 at 18:35 UTC ( #11118670=note: print w/replies, xml ) Need Help??


in reply to Re^2: Reliably parsing an integer
in thread Reliably parsing an integer

To validate number formats, use Regexp::Common::number. [emphasis added]
How are these regular expressions going to help? ... How is a regular expression going to help me accept 4294967295 ... but not 4294967296 ...?
Regexes can help validate number formats, but not, in general, ranges. (It's quite often possible to construct a regex to discriminate a number range, but this is usually more of an academic exercise than a practical solution. Common exceptions are for decimal octet and year/month/day ranges.)

... command-line option --resume-from-line ...

This quote from the OP suggests the user is to enter a simple line number of a file. Are you really dealing with source/data/whatever files of more than 4,000,000,000 (or 18,446,744,073,709,551,615 or, God help us all, 99,999,999,999,999,999,999) lines? If not, what do you care if your Perl is UINT32_MAX or UINT64_MAX? Why not just use a validation test something like
    $n !~ /\D/ && $n < 4_000_000_000
(or some more reasonable upper limit) and be done with it?

Or is your question intended to address a more general case?


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^4: Reliably parsing an integer (updated)
by haukex (Bishop) on Jun 29, 2020 at 21:35 UTC
    It's quite often possible to construct a regex to discriminate a number range, but this is usually more of an academic exercise than a practical solution.

    Turns out to not be too difficult if you cheat a little with some embedded code ;-) (of course hippo's suggestion is a lot more elegant Update: if that's the limit one wants to implement)

    use Config; use Math::BigInt; my $regex = do { my $max = eval $Config{nv_overflows_integers_at} or die; my $len1 = length($max) - 1; my $range = substr $max, 0, 1; $range = $range eq "1" ? "1" : "1-$range"; qr{ \A (?: (?!0) [0-9]{1,$len1} | ( [$range] [0-9]{$len1} ) (?(?{ Math::BigInt->new($^N)->bgt($max) })(*F)) ) \z }x };

    One difference to my first version is that this regex doesn't allow zero (0).

    Update: Updated benchmark and faster version of the code here.

Re^4: Reliably parsing an integer
by rdiez (Acolyte) on Jun 29, 2020 at 19:58 UTC
    Or is your question intended to address a more general case?

    Of course I would like a solution for the general case. The reason I asked here is that I thought I had missed something obvious, because it seemed such a basic thing to do. At least it is easy in other programming languages I use. I am aware that you can use Perl modules like Math::BigInt, but I hope I can get this done without having to 'include' a huge, slow monster like that. For example, the script I mentioned reads many lines from a file, which may have been generated by another tool. The file may be corrupted or wrong. I just want a general, fast, simple integer validation solution that I can reuse in other scripts I have around.

    I had hoped for a ready-made solution, but I'll have to code it myself. So let's recap what I have now:

    • Create the largest integer number that the current Perl interpreter supports.
    • Convert it to a string.
    • Find out how long the string is.
    • Check if the "value to parse" has any non-digits with a regular expression like  $str =~ m/[^0-9]/; . Or use a ready-made one like Regexp::Common::number .
    • Remove all leading zeros.
    • If the length is < max length, then it is a fine integer.
    • If the length is > max length, then it is too big for us.
    • If the length is the same, then a simple lexicographical comparison ( $str1 cmp $str2 ) should work.

    Have I missed something?

    Before I begin, could we use pack/unpack instead? I am not very familiar with them, but there are several integer types there.

      Create the largest integer number that the current Perl interpreter supports.

      If you aren't picking a specific maximum number but just want to allow whatever the running perl does then you just need a function like this valid_int here:

      use strict; use warnings; use Test::More tests => 5; ok valid_int ('18446744073709551614'), '1 under max int'; ok valid_int ('18446744073709551615'), 'exactly max int'; ok ! valid_int ('18446744073709551616'), '1 over max int'; ok ! valid_int ('foo'), 'NaN'; ok ! valid_int ('1.3'), 'float'; sub valid_int { my $num = shift; return unless $num =~ /^\d+$/a; return int $num eq $num; }

      The hard-coded values are just to test because I am on a 64-bit perl, adjust them to your own testing environment as appropriate.

        Your solution is interesting, thanks.

        The trouble is, I do not really trust the "int $num eq $num;" expression. I have seen that 'int' can do weird things, and I am not convinced that there is no floating-point corner case somewhere on some hardware that spits out something that looks like an integer, because some inaccuracy cancels out when generating a string from it.

        Besides, the "eq" is actually converting the integer back to a string, which will cost performance. I think I can write faster code with the steps I outlined above. Stay tuned.

          A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11118670]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (4)
As of 2020-09-19 03:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If at first I donít succeed, I Ö










    Results (114 votes). Check out past polls.

    Notices?