Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Equality checking for strings AND numbers

by Anonymous Monk
on Jul 13, 2007 at 00:16 UTC ( [id://626343]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Perl Monks, I am currently in the process of writing a custom "diff" utility for comparing two text files field-by-field, for reasons that I will omit here. One thing I need to do is generic equality checking - use "==" for numeric comparisons, and "eq" for strings, since the fields contain different data types, and sometimes have different precision formats e.g. 10.0 vs 10.00. To that end, I created the following subroutine. However, that raised two questions for me: 1. Does this make sense. Is there a better way? 2. Is there a CPAN module out there for doing this type of field-by-field checking (essentially column by column for a particular row)? I searched, but alas could find nothing suitable.
use Scalar::Util(looks_like_number); sub comp { my ($a, $b) = @_; if (looks_like_number($a) && looks_like_number($b)) { return ($a == $b); } else { return ($a eq $b); } }

Replies are listed 'Best First'.
Re: Equality checking for strings AND numbers
by BrowserUk (Patriarch) on Jul 13, 2007 at 00:21 UTC
      Good observation. Most values are integers, but with different precisions. Real numbers SHOULD have the same precisions in these files, and I actually want to detect if they don't i.e. 10 and 10.000000000000001 should be treated as different. If exact comparison on reals becomes and issue, I guess I could use sprintf to compare only the leading decimal places, or do a ratio comparison. Thanks for the heads-up.

        when comparing numbers, I tend to avoid using == to go for something like:

        sub equality{ my ($a, $b, $eps) = @_; abs( $a-$b ) < $eps ? return 1: return 0; }

        where $eps is the desired precision

        Cheers,

        lin0
Re: Equality checking for strings AND numbers
by syphilis (Archbishop) on Jul 13, 2007 at 02:05 UTC
    Be aware that looks_like_number() returns true for strings like '9e0' and '9', but false for strings like '0x9'.

    I don't know if it has any impact on what you are doing but your comp() subroutine will return true when comparing the numbers 9 and 0x9, will return true when comparing the strings '9e0' and '9', but will return false when comparing the strings '0x9' and '9' (or when comparing the string '0x9' to the number 9).
    use strict; use warnings; use Scalar::Util qw(looks_like_number); my $x1 = '0x9'; my $x2 = 0x9; my $x3 = '9'; my $x4 = '9e0'; my $x5 = 9e0; print "1: ", comp($x1, 9), "\n"; print "2: ", comp($x2, 9), "\n"; print "3: ", comp($x3, 9), "\n"; print "4: ", comp($x4, 9), "\n"; print "5: ", comp($x5, 9), "\n"; sub comp { my ($a, $b) = @_; if (looks_like_number($a) && looks_like_number($b)) { return ($a == $b); } else { return ($a eq $b); } } __END__ Outputs: 1: 2: 1 3: 1 4: 1 5: 1
    Cheers,
    Rob
      Thanks for the warning - all numerical values will be base10, sometimes in scientific format, so the looks_like_number call should work in this case.

      So, looks_like_number only works for base10 (and below) numbers i.e. hexadecimal values with or without a trailing 0x will return false?

      Although, of course not knowing the number base for numerical values will cause all kinds of other problems! ;)
        On another note you could use Algorithm::Diff which would allow you to provide your own matching (or "key generation") function as they call it. This gets over the deficiencies of Text::Diff in only comparing text strings.
        Looking at the Text::Diff module, I noticed the following:
        my $diff = diff \&reader1,\&reader2;
        I assume that this means you can use a subroutine to return the column you need from the input files and then just use Text::Diff to compare.

        Do you have some sample input files? What sort of output are you expecting to be generated (a list of the differences, print to screen etc) and what should the format of this output be??
        Updated: Questions added
Re: Equality checking for strings AND numbers
by toma (Vicar) on Jul 13, 2007 at 08:11 UTC
    This is a difficult problem in Perl for the most general case. Numbers like 1111111111111111111e1111111111111111111 pass the 'looks_like_number' test but don't fare well in arithmetic expressions. This doesn't do what you would hope:
    use strict; use warnings; use Scalar::Util qw(looks_like_number); my $c="11111111111111111e11111111111111111"; my $d="22222222222222222e22222222222222222"; if (looks_like_number($c) and looks_like_number($d) and $c == $d) { print "$c = $d\n"; }
    It should work perfectly the first time! - toma
      That's probably because those large numbers are essentially Infinity? At least as far as normal numerical storage goes? The looks_like_number call does allow for Infinity, and treats it like a number, and Infinity == Infinity should be true!

      I don't need to the use of any of the "big" number support, which I believe doesn't play well with the looks_like_number anyway.
Re: Equality checking for strings AND numbers - the future
by tirwhan (Abbot) on Jul 13, 2007 at 15:27 UTC

    Since noone has mentioned it so far I'd just like to point out that for Perl versions >= 5.9.3 you can use the smart match operator ~~ for this. So for example, the following would work:

    use feature ":5.10"; my $x=10; my $y="10.00"; say "matches" if ($x ~~ $y);
    (tested with 5.9.5). See perlsyn for details on how smart match works.


    All dogma is stupid.
Re: Equality checking for strings AND numbers
by eXile (Priest) on Jul 13, 2007 at 16:34 UTC
    I posted a similar problem before, and got a great answer ( Re: check if 2 values are equal ), involving putting all things to be compared in hash as hash keys and counting the number of keys.
Re: Equality checking for strings AND numbers
by mr_mischief (Monsignor) on Jul 16, 2007 at 01:47 UTC

    Is there a maximum precision which any of the numbers will ever be?

    If so, and you want any differences to be noted as you responded to BrowserUk, why not promote all things that look like numbers to some ridiculously high precision using sprintf() and then compare everything based on strings? (edit: fixed this sentence for grammar)

    printf "%1.20f\n", int(10.1) ; printf "%1.20f\n", 10 ; printf "%1.20f\n", 012 ; printf "%1.20f\n", "10" ; printf "%1.20f\n", 1e1 ; printf "%1.20f\n", 10.100 ; printf "%1.20f\n", 10.1 ; printf "%1.20f\n", '10.1' ; printf "%1.20f\n", 10.1000000000 ; printf "%1.20f\n", 10.1000000001 ;

    You'll end up with roundoff errors on reals from the precision boost, but for perfectly equivalent values in the first place you should get the same roundoff errors. It's not like you're accumulating the errors through arithmetic with the values, since you're just promoting them and then immediately doing the comparison. The old adage about not testing floats for equality doesn't really apply here, unless you do want to allow a range of difference in the original inputs.

    The main issue with this as I see it is that while you should be okay for a single environment, you'll potentially be dealing with different values for the floats if you try to take the promoted values as output from more than one software environment.

Re: Equality checking for strings AND numbers
by Anonymous Monk on Jul 15, 2007 at 06:37 UTC
    When comparing the numerical data for equality are you ever comparing numbers of different precision? If not why not just convert all data to strings and compare the string results. In deciding whether or not to cover the data you could use something like:

    $string = to_string($string) unless is_string($string); (pulled from http://search.cpan.org/~dwheeler/Data-Types-0.06/lib/Data/Types.pm)

    I am new to this so this is just a thought.

      The problems here is that all of the numbers (ints and reals) do have different precisions - 10.0 and 10.00 are numerically equivalent, but are different when treated as strings.
        Check this page out, It has tests to find variables types, and how to convert them. You can test if a value is an int, if it is convert it to a real and then do the comparison.

        http://search.cpan.org/~dwheeler/Data-Types-0.06/lib/Data/Types.pm.

        With this you should be able to get it to at least the same variable type. If both become float types and and you compare 10.0 to 10.00, you should end up with equality... Another though that you can do is set up a tolerance for precision on number comparisons;

        Instead of if a == b

        do if (absolute value of (a - b)) > .0001 then .....

        Just some thoughts. I personally haven't done a lot with Perl yet.

        I'd argue that if you're bothering to mention precision at all, then 10.0 != 10.00.

        10.0 is really "somewhere between 9.95 and 10.05", and 10.00 is really "somewhere between 9.995 and 10.005". So if your 10.0 is really 9.97, it can't possibly be equal to 10.00.

Re: Equality checking for strings AND numbers
by shoness (Friar) on Jul 16, 2007 at 13:29 UTC
    Using the strtod and strtol methods from the POSIX module, you can convert the strings that Perl reads to numbers that you can operate on. It also suggests a nice "is_numeric" method:
    # Begin quoting from <http://p3m.org/faq/C3/Q3.html> sub getnum { use POSIX qw(strtod); my $str = shift; $str =~ s/^\s+//; $str =~ s/\s+$//; $! = 0; my($num, $unparsed) = strtod($str); if (($str eq '') || ($unparsed != 0) || $!) { return undef; } else { return $num; } } sub is_numeric { defined getnum($_[0]) } # end quoting... sub comp { use POSIX qw(strtol); my ($a, $b) = @_; if (is_numeric($a) && is_numeric($b)) { return (strtol($a * 100) == strtol($b * 100)); } else { return ( $a eq $b ); } }
Re: Equality checking for strings AND numbers
by Moron (Curate) on Jul 16, 2007 at 13:25 UTC
    As halley indicated, an absolute Epsilon test doesn't work well for all kinds of data. What about fractional comparison? e.g,:
    sub fromp { my ( $x, $y, $eps) = @_; ( abs( ($y - $x ) / ( $x || $y || return (1) ) ) < $eps ); }
    $eps should be the fractional closeness e.g. 0.000000001 would invoke a fractional threshold of a billionth.

    The chain of ||s ensures that either the divisor is non-zero or division is prevented by returning 1 where both are 0 (therefore equal).

    __________________________________________________________________________________

    ^M Free your mind!

      Sorry halley, I missed your post on the absolute Epsilon test. Just throwing around ideas to use and didn't realize that one was already out on the table

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://626343]
Approved by BrowserUk
Front-paged by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2024-04-16 18:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found