Re^3: Reliably parsing an integer (updated)
by haukex (Archbishop) on Jun 29, 2020 at 20:19 UTC
|
How is a regular expression going to help me accept 4294967295 (UINT32_MAX) but not 4294967296 (UINT32_MAX+1)?
AnomalousMonk answered this one; I suggested the module because it's good for verifying various number formats.
I would rather avoid the overhead of using Math::BigInt if I can.
Well, if you want to go right up to that limit*, then I think the safest way is to use Math::BigInt, because it'll correctly handle strings that are clearly over the limit. It's a core module and it's not really that much overhead: once you've used the module to confirm that the number will work as a normal integer without loss of precision, you no longer need the object and can just work with a plain Perl scalar afterwards. Just an example:
use warnings;
use strict;
use feature 'state';
use Carp;
use Math::BigInt;
use Config;
use Regexp::Common qw/number/;
sub validate_int {
my $str = shift;
state $max = Math::BigInt->new(
eval $Config{nv_overflows_integers_at} );
croak "not an integer"
unless defined $str && $str=~/\A$RE{num}{int}\z/;
my $num = Math::BigInt->new($str);
croak "integer to small" if $num < 0;
croak "integer too big" if $num > $max;
return $num->numify;
}
use Test::More;
sub exception (&) { eval { shift->(); 1 } ? undef : ($@ || die) }
is validate_int(0), 0;
is validate_int(1), 1;
is validate_int(3), 3;
ok exception { validate_int(undef) };
ok exception { validate_int("") };
ok exception { validate_int("x") };
ok exception { validate_int("123y") };
ok exception { validate_int(-1) };
ok exception { validate_int("-9999999999999999999999999999999") };
my $x = Math::BigInt->new(eval $Config{nv_overflows_integers_at})-1;
is validate_int("$x"), 0+$x->numify, "'$x' works (max-1)";
$x++;
is validate_int("$x"), 0+$x->numify, "'$x' works (max)";
$x++;
ok exception { validate_int("$x") }, "'$x' fails (max+1)";
ok exception { validate_int("999999999999999999999999999999999999") };
done_testing;
* Update: I named several integer limits in the post I linked to. Depending on which of those limits you want to use, hippo's suggestion from here is of course much easier. | [reply] [d/l] |
Re^3: Reliably parsing an integer
by AnomalousMonk (Archbishop) on Jun 29, 2020 at 18:35 UTC
|
To validate number formats, use Regexp::Common::number. [emphasis added]
How are these regular expressions going to help? ... How is a regular expression going to help me accept 4294967295 ... but not 4294967296 ...?
Regexes can help validate number formats, but not, in general, ranges. (It's quite often possible to construct a regex to discriminate a number range, but this is usually more of an academic exercise than a practical solution. Common exceptions are for decimal octet and year/month/day ranges.)
... command-line option --resume-from-line ...
This quote from the OP suggests the user is to enter a simple line number of a file. Are you really dealing with source/data/whatever files of more than 4,000,000,000 (or 18,446,744,073,709,551,615 or, God help us all, 99,999,999,999,999,999,999) lines? If not, what do you care if your Perl is UINT32_MAX or UINT64_MAX? Why not just use a validation test something like
$n !~ /\D/ && $n < 4_000_000_000
(or some more reasonable upper limit) and be done with it?
Or is your question intended to address a more general case?
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
use Config;
use Math::BigInt;
my $regex = do {
my $max = eval $Config{nv_overflows_integers_at} or die;
my $len1 = length($max) - 1;
my $range = substr $max, 0, 1;
$range = $range eq "1" ? "1" : "1-$range";
qr{ \A (?: (?!0) [0-9]{1,$len1}
| ( [$range] [0-9]{$len1} )
(?(?{ Math::BigInt->new($^N)->bgt($max) })(*F))
) \z }x };
One difference to my first version is that this regex doesn't allow zero (0).
Update: Updated benchmark and faster version of the code here.
| [reply] [d/l] [select] |
|
Or is your question intended to address a more general case?
Of course I would like a solution for the general case. The reason I asked here is that I thought I had missed something obvious, because it seemed such a basic thing to do. At least it is easy in other programming languages I use. I am aware that you can use Perl modules like Math::BigInt, but I hope I can get this done without having to 'include' a huge, slow monster like that. For example, the script I mentioned reads many lines from a file, which may have been generated by another tool. The file may be corrupted or wrong. I just want a general, fast, simple integer validation solution that I can reuse in other scripts I have around.
I had hoped for a ready-made solution, but I'll have to code it myself. So let's recap what I have now:
- Create the largest integer number that the current Perl interpreter supports.
- Convert it to a string.
- Find out how long the string is.
- Check if the "value to parse" has any non-digits with a regular expression like $str =~ m/[^0-9]/; . Or use a ready-made one like Regexp::Common::number .
- Remove all leading zeros.
- If the length is < max length, then it is a fine integer.
- If the length is > max length, then it is too big for us.
- If the length is the same, then a simple lexicographical comparison ( $str1 cmp $str2 ) should work.
Have I missed something?
Before I begin, could we use pack/unpack instead? I am not very familiar with them, but there are several integer types there.
| [reply] [d/l] |
|
use strict;
use warnings;
use Test::More tests => 5;
ok valid_int ('18446744073709551614'), '1 under max int';
ok valid_int ('18446744073709551615'), 'exactly max int';
ok ! valid_int ('18446744073709551616'), '1 over max int';
ok ! valid_int ('foo'), 'NaN';
ok ! valid_int ('1.3'), 'float';
sub valid_int {
my $num = shift;
return unless $num =~ /^\d+$/a;
return int $num eq $num;
}
The hard-coded values are just to test because I am on a 64-bit perl, adjust them to your own testing environment as appropriate. | [reply] [d/l] [select] |
|
|
|
|
|
Re^3: Reliably parsing an integer
by karlgoethebier (Abbot) on Jun 29, 2020 at 18:26 UTC
|
«I do not really want to work with larger integers.»
That’s OK. As an alternative you may try sliced bread. Best regards, Karl
«The Crux of the Biscuit is the Apostrophe»
perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help
| [reply] [d/l] |
|
| [reply] |
|
| [reply] [d/l] [select] |