Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Perl Cannot Be Parsed: A Formal Proof (meh)

by tye (Sage)
on Jan 21, 2008 at 19:02 UTC ( [id://663416]=note: print w/replies, xml ) Need Help??


in reply to Perl Cannot Be Parsed: A Formal Proof

Sorry, I don't buy this proof. In the last code block:

sub halts { my $machine = shift; my $input = shift; is_whatever_nullary( qq{ run_turing_equivalent("\Q$machine\E", "\Q$input\E"); sub whatever() {}; } ) }

is_whatever_nullary() can certainly easily return a true value without having to execute run_turing_equivalent(). The whatever prototype is a compile-time property and the run-time impact of whatever run_turing_equivalent() does will not change the fact that whatever() is "nullary" at compile time.

But it is quite obvious that Perl code can't be reliably parsed without running Perl code by the simple example of:

BEGIN { if( 0.5 < rand() ) { eval "sub whatever() { }; 1" or die $@; } else { eval "sub whatever { }; 1" or die $@; } } whatever / 25 ; # / ; die "this dies!";

You can, of course, replace the above conditional with some high-falutin' comp sci construct of your choosing. But I find that such just detracts from the obviousness. You have to run "0.5 < rand()" in order to parse the last line of the above Perl code (which perl can parse differently each time that it is run, as shown below).

> perl -MO=Deparse above.pl # ... whatever / 25; - syntax OK > perl -MO=Deparse above.pl # ... whatever(/ 25 ; # /); die 'this dies!'; - syntax OK

- tye        

Replies are listed 'Best First'.
Re^2: Perl Cannot Be Parsed: A Formal Proof (meh)
by Jeffrey Kegler (Hermit) on Jan 21, 2008 at 23:51 UTC

    You spotted an unclarity in my presentation. I was vague about what "time" the code strings are to be run. Perhaps this is better:

    sub halts { my $machine = shift; my $input = shift; is_whatever_nullary( qq{ BEGIN { run_turing_equivalent("\Q$machine\E", "\Q$input\E"); sub whatever() {}; } } ) }

    Using rand() is indeed more obvious and it was how I tested my code snippets (I lent my Halting Oracle to a friend and she never returned it ... but don't get me started.) I did not use it in the presentation because while it makes the proof more obvious, it's a proof of a weaker theorem.

    If you interpret rand() as truly random (and not as pseudo-random), then we're dealing with non-deterministic programs. Someone might then say, "as long as there is no non-determinism while compiling, Perl 5 is statically parseable." But it's not and a proof using the Turing machine simulator shows it is not. That Perl 5 is unparseable even when the code is completely deterministic is a much stronger result, and the distinction makes a difference in practice.

    The Turing simulation in this case is brought in for very practical reasons. I can't claim the credit for that. Adam Kennedy's hint took me in this direction. I suspect he also knew the business about rand(). But with his many hours of real life experience trying to statically parse Perl, he focused on the stronger proof -- the one that would give him the most information about what he was up against in creating PPI.

      I suspect he also knew the business about rand(). But with his many hours of real life experience trying to statically parse Perl, he focused on the stronger proof -- the one that would give him the most information about what he was up against in creating PPI.

      You pretty much nailed it exactly.

      From a practical perspective there's about 5-10 things that, in various combinations, make Perl impossible to parse.

      However, there's a big difference between what we believe and what we can prove.

      And so from the perspective of knowing absolutely that Perl 5 can't be parsed, the Halting Problem approach was by far the most straight forward way to do it (at least to a layman like me).

      This realization let me finally put the possibility of a complete parser out of my mind, and once I wasn't distracted by completeness any more, I had the chance to evaluate and actually define what "good enough" would mean for a "Perl 5 parser".

      And that perspective on the problem was the one that took the code in the direction of the working solution.

      I don't really find "If you don't try to solve the halting problem, then you can parse Perl" to be a "stronger" conclusion than "If you don't do something random, then you can parse Perl", other than, of course, the presence of high-falutin' comp sci lingo in the former.

      It is a rather simple fact that you can't reliably parse Perl without running Perl code (arbitrary Perl code). It reminds me of showing my kids a clock face and asking them to list the properties of the numbers on it. They never mention that they are sorted, because that just seems too obvious.

      The fact that you can't parse Perl is obvious. Adding useless complication to the conjecture makes it sound more like a rigorous proof, but I don't believe it actually helped your proof.

      A proof that you can't parse Perl, is:

      BEGIN { my $proto= ""; $proto= "()" if run_some_perl_code(); eval "sub whatever$proto { }"; } whatever / 6; # /;

      The fact that run_some_perl_code() can be any arbitrary Perl code is quite clear. I guess this then boils down to the argument that somehow static analysis could be used to predict the results of run_some_perl_code() without actually running it. But the rand example clearly shows that not to be the case. As does using if @ARGV or if <STDIN> =~ /^y/i or if LWP::Simple::get("http://perlmonks.org/secretAlgorithm.pl") =~ /true/ or if unlink "/etc/passwd" or if find_prime_of_length( ~0 ) % 3 == 1.

      Sure, the fact that run_some_perl_code() could be trying to solve the halting problem also demonstrates something. Of course, if I'm writing Perl support for an IDE, I'm not seriously concerned that the declaration of function prototypes is being based on answers to instances of the halting problem, so I'm likely to just dismiss your proof (based on your own reaction to the rand case). I find "You might have to run arbitrary Perl code" to be both a stronger statement of the result and a more interesting and useful one.

      I also think there is a flaw in your proof in that it confuses solving the halting problem with running code that may or may not halt. So your conclusion would need to be more like "Here is some code that will either determine that whatever() is nullary or that will never terminate." Since just because the code has been taking 400 years doesn't mean that you can determine that it will never halt, so your code doesn't even have the potential of solving the halting problem.

      So the lemma that is needed here is that "Static analysis of a program in a Turing-complete language (along with its 'input') can be insufficient to determine its output". Frankly, this is where it becomes uninteresting to me.

      (Minor updates applied during the 15 minutes after first posting.)

      - tye        

        I don't really find "If you don't try to solve the halting problem, then you can parse Perl" to be a "stronger" conclusion than "If you don't do something random, then you can parse Perl", other than, of course, the presence of high-falutin' comp sci lingo in the former.

        You're kind of right. It's called minimalization. If I can solve this base problem, I can solve this more complex problem. Ex: If I can prove I can sort, I can sort in reverse, I can sort using other means etc etc etc. If I can do those things and have proven those things, I can do even more, chaining all the way.

        You're minimalizing directly to undeterministic behaviour, as opposed to the halting problem. Most people chain minimalized behaviour to save time. You don't have to prove the halting problem since it's already proven. Your intuition is right. It's only awkward by the convention of, "most people don't do it like that."

        I'm amused the conversation is taking place though. You're both arguing cherry pie or apple pie to be tasty. :)

Re^2: Perl Cannot Be Parsed: A Formal Proof (meh)
by BrowserUk (Patriarch) on Jan 21, 2008 at 19:19 UTC
      eval $ARGV[ 0 ];

      The output from perl -MO=Deparse disagrees with you. eval isn't the "problem", BEGIN is.

      - tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://663416]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (3)
As of 2024-04-18 23:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found