http://qs321.pair.com?node_id=428610

rkosai has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, This is a question that touches quite heavily on computing theory, but that can also be tied to Perl, semantics, and "Just Plain Cool". I was wondering if there was any way to determine how similar sections of code are. For example:
$x++; $x = $x + 1; $x += 1;
all do the same thing. Does there exist a way to identify functionally similar blocks of code, and if not, can anyone point me in the right direction on how to do this? Thanks in advance!

Replies are listed 'Best First'.
Re: Equivalency of Code
by Corion (Patriarch) on Feb 07, 2005 at 10:15 UTC

    As any C macro programmer will tell you, the three are not equivalent if you want to substitute any scalar expression for $x. The easy way of telling whether two pieces of code are identical is to force them into one canonical form, and then check if the two forms are identical. In the general case though, proving the functional identity of two pieces of code is equivalent to solving the halting problem, and thus infeasible (I think).

      In the general case though, proving the functional identity of two pieces of code is equivalent to solving the halting problem, and thus infeasible (I think).

      Indeed. If same($code1, $code2) could find out whether the $code1 and $code2 functions behave identically, then

      sub f1 { "different"; } sub f2 { same(\&f1, \&f2) ? "same" : "different"; } f2();
      would be a contradiction.

        To spell it out explicitly, there is nothing that f2() can return that would be right.

        • If f2() returns "same", that means f2() and f1() should return the same things. f1() always returns "different" so this is incorrect.
        • If f2() returns "different", that means f2() and f1() should return different things. Now they are both returning the same thing, so this is also incorrect.

        The logical conclusion is that function same() does not exist.

        That's a pretty big "in the general case".

        One can EASILY prove that the two following pieces of code are equivalent.
        my $foo = 1; my $foo = 1;
        So while you may not be able to work out if _any_ two pieces of code are equivalent, you can work out if _some_ two pieces of code are equivalent.

        As for the above example, it isn't really fair for the comparison function to have to compare itself.

        Generally most observes are given at least the countesy of being outside the situation being examined...

        So to summarise, there are ways to prove _some_ sorts of code are equivalent, just not _any_ code. And if you bias that process towards false negatives you stay pretty safe. (Accept false readings of "not equal" in exchange for certain results of "is equal")
        Actually, you'll hit a recursion limit before you run into your logical contradiction.

        Being right, does not endow the right to be rude; politeness costs nothing.
        Being unknowing, is not the same as being stupid.
        Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
        Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: Equivalency of Code
by borisz (Canon) on Feb 07, 2005 at 10:34 UTC
    Your examples are _not_ the same. But maybe give the same results in your context. If you really want to know how to do this very hard task, read the Dragon Book.
    And read how others do it gccsummit-2003-proceedings. Search for SSA or GIMPLE as a start.
    Boris
Re: Equivalency of Code
by stvn (Monsignor) on Feb 07, 2005 at 11:01 UTC

    You might want to give Perl::Compare a look.

    UPDATE

    Actually it seems that Perl::Compare might not be in the best state right now (it seems he is working on updating it to the latest PPI version). A simplistic replacement is using the normalized method of the PPI::Document object and then comparing the results (the results are actually PPI::Document::Normalized objects which have their == operators overloaded). Here is some code:

    use strict; use warnings; use PPI::Document; my $d1 = PPI::Document->new('$x++'); my $d2 = PPI::Document->new('$x += 1'); my $d3 = PPI::Document->new('$x = x + 1'); print((($d1->normalized() == $d2->normalized) && ($d2->normalized() == $d3->normalized) && ($d3->normalized() == $d1->normalized)) ? "they are all equivalent\n" : "they are not equvalient\n"); print(($d1->normalized() == $d2->normalized) ? "1 and 2 are equivalent\n" : "1 and 2 are not equvalient\n"); print(($d2->normalized() == $d3->normalized) ? "2 and 3 are equivalent\n" : "2 and 3 are not equvalient\n"); print(($d3->normalized() == $d1->normalized) ? "3 and 1 are equivalent\n" : "3 and 1 are not equvalient\n");
    And the output:
    they are not equvalient 1 and 2 are not equvalient 2 and 3 are not equvalient 3 and 1 are not equvalient

    -stvn
      You couldn't have picked a worse time to mention Perl::Compare really.

      The normalization overhaul has taken something that did about 10-15 normalization "things" and now only has one (removal of insignificant things like whitespace/pod/comments).

      The original (and still current to some degree) intent of PPI-style normalization is to check to see if changes to code "matter" or not.

      For example this:
      my $foo = 1; # comment
      Is "equivalent" to this:
      my $foo=1 ;
      As the library of normalization functions grows back towards (and past) what the original and now abandoned Perl::Compare implementation had, it is only really intended to factor out changes like:
      'foo' ---> "foo" "foo$bar" ---> "foo${bar}"; etc... etc...
      One idea was to let other people edit "your" code, while being certain that they haven't actually changed anything.

      I'm sure you can think of others.

      In any case, the three above example may well be different, especially if they are object with overloaded operators.
Re: Equivalency of Code
by ambrus (Abbot) on Feb 07, 2005 at 10:28 UTC

    perl -we '$x = "a"; $x++; print $x, "\n";' ==> b

    perl -we '$x = "a"; $x += 1; print $x, "\n";' ==> 1 with a warning.

Re: Equivalency of Code
by DrHyde (Prior) on Feb 07, 2005 at 11:09 UTC
    They are quite clearly not the same.
    $x = 3; print $x++;
    vs
    $x = 3; print $x += 1;
    Those two are only the same as regards their effect on $x. The expressions do not *evaluate* to be the same.
      No fair!

      HIS code didn't have print statements in it, he was using them as standalone statements. You are testing something different.
      $x = 3; $x++; print $x;
      vs
      $x = 3; $x += 1; print $x;
      ...would be more accurate.

      Of course, given problems with operator overloading, they stil arn't actually equal.
        The point is that $x++ and $x += 1 don't just fiddle with the value of $x. Those statements also have values themselves. If a bit of code is to be equivalent to another, then it needs to both have the same side-effects and the same value as the other does.

        I try not to think about operator overloading or tieing :-)

Re: Equivalency of Code
by blazar (Canon) on Feb 07, 2005 at 11:11 UTC
    In addition and not in contraddiction with what has already been said, you may be interested in -MO=Concise, see B::Concise.

      This was actually my first instinct too, here was the output I got from testing with B::Concise.

      stvn% perl -MO=Concise -e '$x++' 5 <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 -e:1) v ->3 4 <1> preinc[t2] vK/1 ->5 - <1> ex-rv2sv sKRM/1 ->4 3 <#> gvsv[*x] s ->4 -e syntax OK stvn% perl -MO=Concise -e '$x += 1' 6 <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 -e:1) v ->3 5 <2> add[t2] vKS/2 ->6 - <1> ex-rv2sv sKRM/1 ->4 3 <#> gvsv[*x] s ->4 4 <$> const[IV 1] s ->5 -e syntax OK stvn% perl -MO=Concise -e '$x = $x + 1' 8 <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 -e:1) v ->3 7 <2> sassign vKS/2 ->8 5 <2> add[t3] sK/2 ->6 - <1> ex-rv2sv sK/1 ->4 3 <#> gvsv[*x] s ->4 4 <$> const[IV 1] s ->5 - <1> ex-rv2sv sKRM*/1 ->7 6 <#> gvsv[*x] s ->7 -e syntax OK
      You can see that the x += 1 and x = x + 1 are similar, but not enough to call equivalent. Besides I am not sure that opcode equivalence would actually work since perl performs optimizations on the opcode tree during compilation which you would not want to include in your comparisons.

      -stvn
        This was actually my first instinct too, here was the output I got from testing with B::Concise.
        Indeed, it was just a suggestion along the lines of "just look there", you may be interested in it. Actually one of the most interesting points is exactly that it allows you to see the optimizations that the perl compiler makes...
Re: Equivalency of Code
by Anonymous Monk on Feb 07, 2005 at 11:46 UTC
    Does there exist a way to identify functionally similar blocks of code
    In general, you can't, as that will mean solving the halting problem.

    That doesn't mean that you will never be able to say "these two pieces of code will do the same thing" - sometimes you can. And sometimes, you will be able to say "these two pieces of code will not do the same thing". It just means that you can't write a program that for any pair of pieces of code can actually decide whether they will do the same thing or not.

    As for you example, those pieces of code do not do the same thing. For instance, if $x equals "foo", then $x++ will result in $x becoming "fop", while $x = $x + 1 results in $x becoming 1.

    One heuristics that you might try is to compile both pieces of code into bytecode, and compare the bytecodes. That will in a small amount of simple cases tell you that two pieces of code are equivalent.

      In general, you can't, as that will mean solving the halting problem.

      You can't solve this problem in general, but not for the reasons you state.

      The halting problem doesn't apply to finite state machines. Any computer that exists today is a finite state machine: it can only store a finite number of states, not an infinite number. Even quantum computers only probabilistically distinguish a finite number of probabilities, though they do enable massive parallelism.

      You can, in theory, determine whether any two finite state machines do the same thing.

      However, you still can't always do this in practice, because you may not have enough states in your machine: memory is costly, and a given computation might require more states than, say, the galaxy has particles. "Finite" can still mean "mind-bogglingly huge", at times.

      Sorry, but one of the things that annoys me is when people reason about modern computers as if they were Turing machines. It's sometimes very useful to think of them that way, but they're simply not the same.

      Unless you're an immortal with an infinite lifespan, with infinite resources to construct an infinitely large computer, you just can't build a Turing machine. And really, if you're an immortal, don't you have something better to do with your infinite lifespan than to site around waiting literally forever for your Turing machine to halt? I know I'd be out smiting people, or creating life, or something fun instead. :-)

      --
      Ytrew

        Ytrew~

        It turns out that after a couple trillion years you get bored of the smiting and creating. But oddly enough, exploring the limits of math remains interesting...

        Boots
        ---
        Computer science is merely the post-Turing decline of formal systems theory.
        --???
Re: Equivalency of Code
by paulbort (Hermit) on Feb 07, 2005 at 21:32 UTC

    If you want to know if two pieces of code are functionally the same, you could write a test framework and if they both pass all the same tests, they are similar to the extent of your tests.

    But that is impractical in the general case.

    If you want to see how similar two source files are, zip them. This is done in plagiarism detection.


    --
    Spring: Forces, Coiled Again!
Re: Equivalency of Code
by rkosai (Beadle) on Feb 07, 2005 at 20:25 UTC
    Thank you everybody! I didn't actually mean to say that they do the same thing; I actually meant that they do something similar; but I wasn't careful with my words.

    I was trying to identify similar blocks of code with a value that says "they are this similar:". :) But thank you very much, you've given me a ton to work with here.