Re: Reliably parsing an integer

in reply to Reliably parsing an integer

What you need is a BigInt Compare function that allows you to tell if your potentially big integer stored in a string is bigger than Perl's max value. And if it is, then just ignore it or use the max value. You could also just check the length of the string. That would be the fastest solution. If the string is longer than, let's say, 8 bytes, then it may be a number that is too big. So, instead of using that number, you just use 99,999,999. Anything above that value gets cut off. Or you could use this:

##################################################
#                                       v2019.9.27
# Compares two large positive integers.
# The integers can be binary, octal,
# decimal, or hexadecimal.
#
# NOTE: Both numbers must be in the same base.
# The numbers should not contain spaces, tabs, line breaks,
# minus sign, decimal points, or anything other than digits!
# Illegal characters can mess up the result.
#
# Returns: 0 if they are equal
#          1 if the first one is greater
#          2 if the second one is greater
#
# Special cases:
# * When comparing zero against an empty string or
#   undefined value, the zero will be greater.
# * When comparing an undefined value against
#   an empty string, they will be equal.
#
# Usage: INTEGER = CMP(STRING, STRING)
#
sub CMP
{
  my $A = defined $_[0] ? uc($_[0]) : '';
  my $B = defined $_[1] ? uc($_[1]) : '';
  my $AL = length($A);
  my $BL = length($B);
  return 2 if ($AL < $BL);
  return 1 if ($AL > $BL);
  return 0 if ($A eq $B);

# At this point, we know that both numbers have the
# same length, and one of them is greater than the other.

  my $DIFF = 0;
  for (my $i = 0; $DIFF == 0 && $i < $AL; $i++)
  {
    $DIFF = vec($A, $i, 8) - vec($B, $i, 8);
  }
  return ($DIFF > 0) ? 1 : 2;
}
[download]

DISCLAIMER: I am a beginner perl programmer. I wrote this sub last year, and it may have some bugs! For example, the most obvious one is that if you compare two strings "003" and "13" the result will be that the first one is greater. Why? Because it's longer. Lol :P

Comment on Re: Reliably parsing an integer Download Code

Replies are listed 'Best First'.
Re^2: Reliably parsing an integer (updated) by haukex (Archbishop) on Jun 29, 2020 at 19:05 UTC
`$DIFF = vec($A, $i, 8) - vec($B, $i, 8);` As I warned you over a year ago, vec on strings that happen to contain Unicode code points is now a fatal error, as of the newly released 5.32 it dies with "`Use of strings with code points over 0xFF as arguments to vec is forbidden`". Simply documenting "Illegal characters can mess up the result" is not robust. Sorry, but I've commented on it often enough: while you're free to code as you like, I can no longer recommend to anyone to use your "reinvented wheel" code. Update: Added more links. Update 2: DISCLAIMER: ... Please mark your updates as such.	[reply] [d/l] [select]
Re^3: Reliably parsing an integer (updated) by harangzsolt33 (Chaplain) on Jun 30, 2020 at 05:47 UTC
Okay. I couldn't sleep until I corrected my error. This CMP sub works now!! Run the test and see it for yourself! Btw using vec() is not a mistake. If someone is trying to run UNICODE letters through this sub, then there's a serious error in the code, and it should fail. The programmer needs to test each string to make sure it contains nothing else but plain digits before trying to compare the two. Maybe I should include a line which converts a UNICODE string to plain ASCII string, but I don't know how to do that magic... :D #!/usr/bin/perl -w use strict; use warnings; print CMP("0", $b); print CMP("", $b); print CMP("0", ""); print CMP("", "000"); print CMP("", "55"); print CMP("111", "55"); print CMP("8,000,021", "7,999,999"); print CMP("003", "1"); print CMP("001", "2"); print CMP("003", "11"); print CMP("54", "45"); print CMP("123", "32"); print CMP("5", "5"); print CMP("1222225", "001222225"); print CMP(" 15", "15"); print CMP("0010", "100"); print CMP("C97F", "C97E"); print CMP("2E", "AE"); print CMP("00101 ", "00101"); exit; ################################################## # v2020.06.30 # Compares two large positive integers. # The integers can be binary (ones and zeros), # octal, decimal, or hexadecimal. # # NOTE: Both numbers must be in the same base. # You shouldn't try to compare a binary number such # as "10001101" to a hex number like "C4" # as this will give a bad result. # # Returns: 0 if the numbers are equal # 1 if the first one is greater # 2 if the second one is greater # # Special cases: # * When comparing an undefined value against # an empty string or zero, they will be equal. # * Minus signs are always ignored! # # Usage: INTEGER = CMP(STRING, STRING) # sub CMP { my $A = defined $_[0] ? uc($_[0]) : ''; my $B = defined $_[1] ? uc($_[1]) : ''; my $A2 = length($A); my $B2 = length($B); my ($A1, $B1, $CA, $CB, $DIFF) = (0, 0, 48, 48, 0); # SHOW WHAT'S HAPPENING: print "\n\nString1=\|$A\|\nString2=\|$B\| RET="; # Find the first significant digit or starting pointer for each stri +ng. # We will call this A1 and B1. In case the string starts with zeros, # spaces, tabs, new line characters, - and + signs, or other special # characters, we skip through those. We ignore them. while ($A1 < $A2 && vec($A, $A1, 8) < 49) { $A1++; } while ($B1 < $B2 && vec($B, $B1, 8) < 49) { $B1++; } # Find last significant digit or ending pointer for each string. # We will call this A2 and B2. while ($A2 > $A1 && vec($A, --$A2, 8) < 48) {} $A2++; while ($B2 > $B1 && vec($B, --$B2, 8) < 48) {} $B2++; # Calculate the number of digits in each number. my $AL = $A2 - $A1; my $BL = $B2 - $B1; # Are both numbers the same length? if ($AL == $BL) { # Compare from left to right, incrementing # pointers A1 and B1 as we walk through all the digits. while ($A1 < $A2) { $CA = vec($A, $A1++, 8); # Get digit from string A $CB = vec($B, $B1++, 8); # Get digit from string B $DIFF = $CA - $CB; if ($DIFF) { return $DIFF < 0 ? 2 : 1; } } return 0; } return 1 if ($AL > $BL); return 2 if ($AL < $BL); return 0; } [download]	[reply] [d/l]
Re^3: Reliably parsing an integer (updated) by harangzsolt33 (Chaplain) on Jun 29, 2020 at 20:12 UTC
Oops.. Sorry!	[reply]

In Section Seekers of Perl Wisdom