in reply to Re^3: Reliably parsing an integer in thread Reliably parsing an integer
Or is your question intended to address a more general case?
Of course I would like a solution for the general case. The reason I asked here is that I thought I had missed something obvious, because it seemed such a basic thing to do. At least it is easy in other programming languages I use. I am aware that you can use Perl modules like Math::BigInt, but I hope I can get this done without having to 'include' a huge, slow monster like that. For example, the script I mentioned reads many lines from a file, which may have been generated by another tool. The file may be corrupted or wrong. I just want a general, fast, simple integer validation solution that I can reuse in other scripts I have around.
I had hoped for a ready-made solution, but I'll have to code it myself. So let's recap what I have now:
- Create the largest integer number that the current Perl interpreter supports.
- Convert it to a string.
- Find out how long the string is.
- Check if the "value to parse" has any non-digits with a regular expression like $str =~ m/[^0-9]/; . Or use a ready-made one like Regexp::Common::number .
- Remove all leading zeros.
- If the length is < max length, then it is a fine integer.
- If the length is > max length, then it is too big for us.
- If the length is the same, then a simple lexicographical comparison ( $str1 cmp $str2 ) should work.
Have I missed something?
Before I begin, could we use pack/unpack instead? I am not very familiar with them, but there are several integer types there.
Re^5: Reliably parsing an integer
by hippo (Bishop) on Jun 29, 2020 at 21:30 UTC
|
Create the largest integer number that the current Perl interpreter supports.
If you aren't picking a specific maximum number but just want to allow whatever the running perl does then you just need a function like this valid_int here:
use strict;
use warnings;
use Test::More tests => 5;
ok valid_int ('18446744073709551614'), '1 under max int';
ok valid_int ('18446744073709551615'), 'exactly max int';
ok ! valid_int ('18446744073709551616'), '1 over max int';
ok ! valid_int ('foo'), 'NaN';
ok ! valid_int ('1.3'), 'float';
sub valid_int {
my $num = shift;
return unless $num =~ /^\d+$/a;
return int $num eq $num;
}
The hard-coded values are just to test because I am on a 64-bit perl, adjust them to your own testing environment as appropriate. | [reply] [d/l] [select] |
|
Your solution is interesting, thanks.
The trouble is, I do not really trust the "int $num eq $num;" expression. I have seen that 'int' can do weird things, and I am not convinced that there is no floating-point corner case somewhere on some hardware that spits out something that looks like an integer, because some inaccuracy cancels out when generating a string from it.
Besides, the "eq" is actually converting the integer back to a string, which will cost performance. I think I can write faster code with the steps I outlined above. Stay tuned.
| [reply] |
|
I have seen that 'int' can do weird things, and I am not convinced that there is no floating-point corner case somewhere on some hardware that spits out something that looks like an integer, because some inaccuracy cancels out when generating a string from it.
Of course, if you could show an example of this, that'd be much better than a vauge worry. int is documented to return an integer, and hippo's valid_int includes a check that the input is not floating-point.
If you're worried about a cutoff happening at 9,007,199,254,740,991 instead of 9,007,199,254,740,992, both of which are over nine quadrillion, then I suggest you're worrying about the wrong thing: since you're saying you want this to be portable to different machines and different Perls, this cutoff is arbitrary anyway!
If you want a precise cutoff, you should choose a specific one that you are pretty certain will work on all expected architectures, like say 2**31-1, and if you wanted to code super defensively, you can even compare this cutoff to the integer limits I linked to and report an error otherwise.
Besides, the "eq" is actually converting the integer back to a string, which will cost performance.
hippo beat me to it: Perl's internal conversions are very fast, I suspect it'll be negligible. But let's not guess - Benchmark!
| [reply] [d/l] [select] |
A reply falls below the community's threshold of quality. You may see it by logging in.
|
|
Besides, the "eq" is actually converting the integer back to a string, which will cost performance.
#!/usr/bin/env perl
use strict;
use warnings;
use Benchmark 'timethis';
timethis (10_000_000, "valid_int ('18446744073709551614')");
sub valid_int {
my $num = shift;
return unless $num =~ /^\d+$/a;
return int $num eq $num;
}
__END__
timethis 10000000: 8 wallclock secs ( 7.93 usr + 0.00 sys = 7.93 CP
+U) @ 1261034.05/s (n=10000000)
So, less than a microsecond on my aging system. Doubtless this can be improved upon (although about half the time taken appears to be the regex so there's a limit there too). | [reply] [d/l] |
|
|
|
|
|