http://qs321.pair.com?node_id=11118666


in reply to Re: Reliably parsing an integer (updated)
in thread Reliably parsing an integer

First of all, thanks for your quick answer.

Perl's maximum values for integers can be determined

That is brilliant, thanks.

If you want to work with integers larger than that, use Math::BigInt.

I do not really want to work with larger integers. I want to allow up to the maximum unsigned integer value in the current Perl. I would rather avoid the overhead of using Math::BigInt if I can.

To validate number formats, use Regexp::Common::number.

How are these regular expressions going to help? Say the current Perl uses 32-bit integers. How is a regular expression going to help me accept 4294967295 (UINT32_MAX) but not 4294967296 (UINT32_MAX+1)?

Say Perl automatically "upgrades" UINT64_MAX + 1 to a floating-point value, and the platform uses 64-bit integers but also 64-bit floating point values. Is that not going to lose some precision? How can I then accept exactly up to UINT64_MAX but not UINT64_MAX + 1 ?

Replies are listed 'Best First'.
Re^3: Reliably parsing an integer (updated)
by haukex (Archbishop) on Jun 29, 2020 at 20:19 UTC
    How is a regular expression going to help me accept 4294967295 (UINT32_MAX) but not 4294967296 (UINT32_MAX+1)?

    AnomalousMonk answered this one; I suggested the module because it's good for verifying various number formats.

    I would rather avoid the overhead of using Math::BigInt if I can.

    Well, if you want to go right up to that limit*, then I think the safest way is to use Math::BigInt, because it'll correctly handle strings that are clearly over the limit. It's a core module and it's not really that much overhead: once you've used the module to confirm that the number will work as a normal integer without loss of precision, you no longer need the object and can just work with a plain Perl scalar afterwards. Just an example:

    use warnings; use strict; use feature 'state'; use Carp; use Math::BigInt; use Config; use Regexp::Common qw/number/; sub validate_int { my $str = shift; state $max = Math::BigInt->new( eval $Config{nv_overflows_integers_at} ); croak "not an integer" unless defined $str && $str=~/\A$RE{num}{int}\z/; my $num = Math::BigInt->new($str); croak "integer to small" if $num < 0; croak "integer too big" if $num > $max; return $num->numify; } use Test::More; sub exception (&) { eval { shift->(); 1 } ? undef : ($@ || die) } is validate_int(0), 0; is validate_int(1), 1; is validate_int(3), 3; ok exception { validate_int(undef) }; ok exception { validate_int("") }; ok exception { validate_int("x") }; ok exception { validate_int("123y") }; ok exception { validate_int(-1) }; ok exception { validate_int("-9999999999999999999999999999999") }; my $x = Math::BigInt->new(eval $Config{nv_overflows_integers_at})-1; is validate_int("$x"), 0+$x->numify, "'$x' works (max-1)"; $x++; is validate_int("$x"), 0+$x->numify, "'$x' works (max)"; $x++; ok exception { validate_int("$x") }, "'$x' fails (max+1)"; ok exception { validate_int("999999999999999999999999999999999999") }; done_testing;

    * Update: I named several integer limits in the post I linked to. Depending on which of those limits you want to use, hippo's suggestion from here is of course much easier.

Re^3: Reliably parsing an integer
by AnomalousMonk (Archbishop) on Jun 29, 2020 at 18:35 UTC

    To validate number formats, use Regexp::Common::number. [emphasis added]
    How are these regular expressions going to help? ... How is a regular expression going to help me accept 4294967295 ... but not 4294967296 ...?
    Regexes can help validate number formats, but not, in general, ranges. (It's quite often possible to construct a regex to discriminate a number range, but this is usually more of an academic exercise than a practical solution. Common exceptions are for decimal octet and year/month/day ranges.)

    ... command-line option --resume-from-line ...

    This quote from the OP suggests the user is to enter a simple line number of a file. Are you really dealing with source/data/whatever files of more than 4,000,000,000 (or 18,446,744,073,709,551,615 or, God help us all, 99,999,999,999,999,999,999) lines? If not, what do you care if your Perl is UINT32_MAX or UINT64_MAX? Why not just use a validation test something like
        $n !~ /\D/ && $n < 4_000_000_000
    (or some more reasonable upper limit) and be done with it?

    Or is your question intended to address a more general case?


    Give a man a fish:  <%-{-{-{-<

      It's quite often possible to construct a regex to discriminate a number range, but this is usually more of an academic exercise than a practical solution.

      Turns out to not be too difficult if you cheat a little with some embedded code ;-) (of course hippo's suggestion is a lot more elegant Update: if that's the limit one wants to implement)

      use Config; use Math::BigInt; my $regex = do { my $max = eval $Config{nv_overflows_integers_at} or die; my $len1 = length($max) - 1; my $range = substr $max, 0, 1; $range = $range eq "1" ? "1" : "1-$range"; qr{ \A (?: (?!0) [0-9]{1,$len1} | ( [$range] [0-9]{$len1} ) (?(?{ Math::BigInt->new($^N)->bgt($max) })(*F)) ) \z }x };

      One difference to my first version is that this regex doesn't allow zero (0).

      Update: Updated benchmark and faster version of the code here.

      Or is your question intended to address a more general case?

      Of course I would like a solution for the general case. The reason I asked here is that I thought I had missed something obvious, because it seemed such a basic thing to do. At least it is easy in other programming languages I use. I am aware that you can use Perl modules like Math::BigInt, but I hope I can get this done without having to 'include' a huge, slow monster like that. For example, the script I mentioned reads many lines from a file, which may have been generated by another tool. The file may be corrupted or wrong. I just want a general, fast, simple integer validation solution that I can reuse in other scripts I have around.

      I had hoped for a ready-made solution, but I'll have to code it myself. So let's recap what I have now:

      • Create the largest integer number that the current Perl interpreter supports.
      • Convert it to a string.
      • Find out how long the string is.
      • Check if the "value to parse" has any non-digits with a regular expression like  $str =~ m/[^0-9]/; . Or use a ready-made one like Regexp::Common::number .
      • Remove all leading zeros.
      • If the length is < max length, then it is a fine integer.
      • If the length is > max length, then it is too big for us.
      • If the length is the same, then a simple lexicographical comparison ( $str1 cmp $str2 ) should work.

      Have I missed something?

      Before I begin, could we use pack/unpack instead? I am not very familiar with them, but there are several integer types there.

        Create the largest integer number that the current Perl interpreter supports.

        If you aren't picking a specific maximum number but just want to allow whatever the running perl does then you just need a function like this valid_int here:

        use strict; use warnings; use Test::More tests => 5; ok valid_int ('18446744073709551614'), '1 under max int'; ok valid_int ('18446744073709551615'), 'exactly max int'; ok ! valid_int ('18446744073709551616'), '1 over max int'; ok ! valid_int ('foo'), 'NaN'; ok ! valid_int ('1.3'), 'float'; sub valid_int { my $num = shift; return unless $num =~ /^\d+$/a; return int $num eq $num; }

        The hard-coded values are just to test because I am on a 64-bit perl, adjust them to your own testing environment as appropriate.

Re^3: Reliably parsing an integer
by karlgoethebier (Abbot) on Jun 29, 2020 at 18:26 UTC
    «I do not really want to work with larger integers.»

    That’s OK. As an alternative you may try sliced bread. Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

      Oh, many thanks for your invaluable help!

        De rien. You may consider use Math::BigInt lib => 'GMP';. See also.

        «The Crux of the Biscuit is the Apostrophe»

        perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help