http://qs321.pair.com?node_id=781447

ELISHEVA has asked for the wisdom of the Perl Monks concerning the following question:

It seems I am always learning something new about Perl. Today while debugging some graph navigation code I discovered something that surprised me a bit. I was naively using eq to see if we had visited a graph node before, but for some reason my code was insisting that we had already visited a node that I knew we hadn't visited before. It turned out that the reason for this confusing behavior was that qr{someregex} eq '(?-xism:someregex)' was returning true. For example, the following code:

use strict; use warnings; my $s='(?-xism:a)'; my $re=qr{a}; # regex and string are clearly different ref types print "-------Type------------\n"; print "ref $re (regex) is 'Regexp': " , (ref($re) eq 'Regexp' ?'true':'false'), "\n"; print "ref $s (string) is '': " , (ref($s) eq '' ?'true':'false'), "\n"; print "ref $s (string) is not 'Regexp': " , (ref($s) ne 'Regexp' ?'true':'false'), "\n"; # so why are they equal? print "-------Equality--------\n"; print "comparing literals: qr{a} eq '(?-xism:a)': " , (qr{a} eq '(?-xism:a)'?'true':'false'), "\n"; print "comparing variables: regex eq $s (string): " , ($s eq $re?'true':'false'), "\n";

prints

-------Type------------ ref (?-xism:a) (regex) is 'Regexp': true ref (?-xism:a) (string) is '': true ref (?-xism:a) (string) is not 'Regexp': true -------Equality-------- comparing literals: qr{a} eq '(?-xism:a)': true comparing variables: regex eq (?-xism:a) (string): true

I'm used to eq comparing numbers to strings - Perl considers them both to be scalars, but here Perl was considering two things that were clearly different types as "equal" - one a reference to a regex and the other a string. So my questions are:

Best, beth

Replies are listed 'Best First'.
Re: What is the best way to compare variables so that different types are non-equal?
by ikegami (Patriarch) on Jul 19, 2009 at 16:21 UTC

    For what other pairings of data types does eq ignore type?

    eq pays no attention to the type whatsoever. eq stringifies its operands and compares those strings.

    >perl -le"print( undef eq '' )" 1 >perl -le"print( 123 eq '123' )" 1 >perl -le"$r=\$s; print( $r eq sprintf('SCALAR(0x%x)', 0+$r) )" 1 >perl -le"print( qr/a/ eq '(?-xism:a)' )" 1 etc

    I know there is a way to overload operators but I was under the impression that one had to "use overload" to empower it

    You use use overload to add overloading to a class, not to decide whether or not overloading will occur.

    Regex pattern objects are magical. Overloading doesn't even come into play.

      Regex pattern objects are magical. Overloading doesn't even come into play.

      Magical is right. This took me by surprise.

      use overload; use Scalar::Util qw( blessed ); use Test::More 'tests' => 4; my $rx = qr/a/; ok blessed $rx, 'Regexp is blessed'; ok ! overload::Overloaded( $rx ), 'Regexp is overloaded'; ok ! overload::Method( $rx, q{""} ), 'Regexp has q{""} overloaded'; isnt "$rx", overload::StrVal( $rx ), 'StrVal of regexp isnt regexp stringified';
        ok ! overload::Overloaded( $rx ), 'Regexp is overloaded'; ok ! overload::Method( $rx, q{""} ), 'Regexp has q{""} overloaded';
        I'm confused shouldn't it say "is not" and "has not"?

        Cheers Rolf

      Regex pattern objects are magical. Overloading doesn't even come into play.

      so what about this???

      package Regexp; use overload q{""} => sub {return "overloaded" } ; package main; my $rx = qr/a/; print $rx;
      output: overloaded

      Cheers Rolf

      UPDATE: Finally got it! 8)

        so what about this???

        I'm not sure what you are asking.

        You transformed the class. Of course my statements about the original class won't necessarily apply to the transformed class.

Re: What is the best way to compare variables so that different types are non-equal?
by BrowserUk (Patriarch) on Jul 19, 2009 at 16:05 UTC
    qr/STRING/msixpo

    This operator quotes (and possibly compiles) its STRING as a regular expression. STRING is interpolated the same way as PATTERN in m/PATTERN/. If "'" is used as the delimiter, no interpolation is done. Returns a Perl value which may be used instead of the corresponding /STRING/msixpo expression. The returned value is a normalized version of the original pattern. It magically differs from a string containing the same characters: ref(qr/x/) returns "Regexp", even though dereferencing the result returns undef.

    I think the relevant part of the docs is the bit I've highlighted. qr// compiles normalises the regex--which basically means blessing it. Hence the magical behaviour.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I don't know why you crossed out "compiles".
      >perl -Mre=debug -e"qr/a/" Compiling REx "a" Final program: 1: EXACT <a> (3) 3: END (0) anchored "a" at 0 (checking anchored isall) minlen 1 Freeing REx: "a"
        I don't know why you crossed out "compiles".

        Because the docs I quoted used the term 'normalises'. Can they be both compiled and normalised? Can they be normalised and not compiled or vice versa?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: What is the best way to compare variables so that different types are non-equal?
by LanX (Saint) on Jul 19, 2009 at 15:41 UTC
      Is there a way to override this so that things belonging to different data types are always not equal?

      I'm not sure if I understood your application, you're comparing "nodes" which can be strings or regexes or other data types, right?

      So wouldn't it be natural to construct a class "Node", with an overloaded "eq" operator? (¹)

      If your goal is to compare different node-objects, why don't you implement them as objects?

      If you also tie these object-refrences you can use them like normal scalars.

      So the answer is yes you might be able override this behaviour ;-)

      Cheers Rolf

      FOOTNOTE: (1) hmm I think I'd rather prefere to overload ==

        My question about overriding the type-blind behavior of eq (#2) was primarily a question about Perl idiom. Before I settled on a solution, I wanted a better understanding of the Perl syntax for handling various definitions of equality.

        I was also looking ahead to future coding scenarios. Part of good design is anticipating the environment around the design. Part of good testing is understanding exactly what one's test for equality is doing. Once I saw my mistake I was worried about what other magic and 'action at a distance' effects I need to consider when writing tests and developing algorithms that involve testing for equality.

        So wouldn't it be natural to construct a class "Node", with an overloaded "eq" operator?

        The code I'm testing is pretty well factored so the actual fix involves exactly two comparisons within a single subroutine. There isn't really a need for a global solution that will be "carried" with a "node object". Also the "node-i-ness" comes from the fact that the datum is part of larger structure, e.g. an array or a hash. It doesn't need an object wrapper to get that trait.

        If there is no ready-made Perl idiom I will probably have my subroutine call the subroutine below for its two comparisons. The subroutine mentioned above needs a definition of equality that duplicates unoverloaded eq, except for the added constraint that like must be compared to like:

        sub my_eq { # make sure we are comparing like to like my $xRef = ref($_[0]); return '' unless ($xRef eq ref($_[1])); # compare pure scalars and regex's using 'eq' # compare reference addresses for the rest return ($xRef and ($xRef ne 'Regexp')) ? (Scalar::Util::refaddr($_[0]) == Scalar::Util::refaddr($_[1])) : ($_[0] eq $_[1]); }

        Best, beth

Re: What is the best way to compare variables so that different types are non-equal?
by kyle (Abbot) on Jul 19, 2009 at 16:15 UTC
    1. For what other pairings of data types does eq ignore type?

      Basically every type. It forces each argument to a string before making a comparison.

    2. Is there a way to override this so that things belonging to different data types are always not equal?

      I'd probably write a custom comparison sub to do it.

      use Scalar::Util qw( blessed reftype ); sub comparifier { blessed $_[0] eq blessed $_[1] && reftype $_[0] eq reftype $_[1] && $_[0] eq $_[1] }

      Along the same lines, you might be interested in Reference assessment techniques and how they fail, whose subtext is "gosh, it's hard to figure out what a scalar really is."

    3. ...how can I detect overloaded operators?

      Overloading applies to objects, not operators. (In Perl 6 you can override operators and/or write your own.) If you want to know if an object is overloading, use overload::Overloaded( $obj ). If you want to know if it's overloading eq in particular, you can check overload::Method( $obj, 'eq' ), but you'll also have to look for stringification (overload::Method( $obj, q{""} )).

    4. ...how should I understand it?

      Just that eq forces its operands to be strings in order to do its job. A regular expression stringifies as you've found (it's a blessed reference to an undef with regex magic added).

    You might also be interested in overload::StrVal( $obj ), which gives you the string value of $obj without string overloading. For a regular expression, this is similar to "Regexp=SCALAR(0x1d10860)".

    I hope this helps.

    Updated to fix a thinko in comparifier, thanks to AnomalousMonk.

    Updated again thanks to jdporter.

      or

      sub comparifier { no warnings 'uninitialized'; # because blessed() and reftype() can r +eturn undef. blessed($_[0]).reftype($_[0]).$_[0] eq blessed($_[1]).reftype($_[1]).$_[1] }
        That can result in false positives.
        my $r = \@a; print comparifier($r, "ARRAY$r") ?1:0,"\n"; # 1
Re: What is the best way to compare variables so that different types are non-equal?
by psini (Deacon) on Jul 19, 2009 at 15:33 UTC

    To answer to points 1 and 4, AFAIK the operator 'eq' stringify both operands before comparing them. So 'X' eq 'Y' if and only if 'print X' and 'print Y' give the same string in output.

    Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

Re: What is the best way to compare variables so that different types are non-equal?
by moritz (Cardinal) on Jul 20, 2009 at 08:38 UTC
    To digress a bit into Perl 6-direction: That's exactly the reason why Perl 6 has many more equality testing operators.

    In this case either eqv (which does the same as Test::More::is_deeply does) or === (which compares if two variables hold the same object) would be appropriate.

Re: What is the best way to compare variables so that different types are non-equal?
by Anonymous Monk on Jul 19, 2009 at 15:48 UTC
    What you are seeing, are the effects of overload for Regexp objects. You can use == to compare
    use strict; use warnings; my $re =qr{a}; my $re2 =qr{a}; warn $re == $re2; warn $re2 == $re2; warn 0+$re; warn 0+$re2; __END__ Warning: something's wrong at - line 8. 1 at - line 9. 2252896 at - line 10. 2252956 at - line 11.
    Regexp objects are a special case (isn't everthing), you can call methods on them, but you can't dereference them

      I really don't understand your example. I substituted warns with prints:

      use strict; use warnings; my $re =qr{a}; my $re2 =qr{a}; print $re == $re2,"\n"; print $re2 == $re2,"\n"; print 0+$re,"\n"; print 0+$re2,"\n";

      and this is the result:

      sini@ordinalfabetix:~$ ./x.pl 1 135589228 135591376 sini@ordinalfabetix:~$

      The last two lines are NOT equal, because are references to two different scalars. And, as a consequence, $re != $re2 (and $re == $re, but it was expected).

      So numerical comparison doesn't tell you if two regex are equal, but only if they are the same (reference).

      Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

Re: What is the best way to compare variables so that different types are non-equal? (overloading "cmp")
by LanX (Saint) on Jul 20, 2009 at 01:12 UTC
    Is there a way to override this so that things belonging to different data types are always not equal? Or is the rather verbose (ref($x) eq ref($y)) and ($x eq $y) the only way to do this?

    Finally I fiddled it out, the perldoc for overload is not really the lightest weed to smoke. The Regexp object might be magical but still has an API to operate with.

    So you may wanna try something like this, but I certainly don't recommend it for production use:

    package Regexp; use overload q{cmp} => sub { return 1 if (ref($_[0]) ne ref($_[1])); # TODO: returning 1 is completely arbitrary # didn't know how to decide which ref is "bigger" return "$_[0]" cmp "$_[1]"; }, fallback => 1; ; package main; my $rx = qr/a/; my $ry =qr/a/; my $x="(?-xism:a)"; use Test::More 'tests' => 2; cmp_ok( $rx, 'eq', $ry, '$rx eq $ry'); cmp_ok( $rx, 'ne', $x, '$rx ne $x');

    Cheers Rolf

      Globally changing the behaviour of regex is very wrong. You don't even gain anything from it. The compare function could just as easily be outside of the class.
        Well as I already said
        I certainly don't recommend it for production use
        , a simple comparison function is much easier to maintain. But Beth asked for options with overloading, and thats the answer.

        If one really needs the behavior of a JS-like === very often this could be a way to achieve it. The old behavior of comparing only the stringified values could still be achieved by explicitly stringifying the arguments, e.g.:  "$rx" cmp "$ry".

        IMHO not overloading is the problem in perl5 but the lack of alternative or freely named operators. Actually overloading results in compatibility problems.

        Anyway the fact that it's "a global change of regex" really surprises me, I expected it to be reduced only to the scope of the file...

        Cheers Rolf

Re: What is the best way to compare variables so that different types are non-equal?
by DrHyde (Prior) on Jul 20, 2009 at 09:20 UTC
    'eq' stringifies its arguments before comparing them. I suggest using Data::Compare, which also knows about data types.
Re: What is the best way to compare variables so that different types are non-equal?
by tilly (Archbishop) on Jul 20, 2009 at 13:33 UTC
    In addition to all other suggestions, if you're walking through a fixed data structure and want to prevent going to the same node twice, it can be worthwhile to numify references to nodes. That is instead of trying to look at, say, $foo{$bar} and see if you've been there, look at 0 + \$foo{$bar} instead.

    This tests whether the underlying scalar has been seen, and will not care if the value is, or looks like, one that you've seen before.

      Explicitly numifying references never came to my mind, only stringifying... That's a very valuable trick in some occasions...

      Just voting once is not enough! 8)

      Thank you!

      Cheers Rolf

Re: What is the best way to compare variables so that different types are non-equal?
by QM (Parson) on Jul 21, 2009 at 21:28 UTC
    Do you need to compare nodes to nodes (references) only?

    If so, can you compare the refs directly with ==? (Maybe this is essentially the same as numifying in another response.)

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of