Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Perl not BNF-able??

by jonadab (Parson)
on Jul 01, 2005 at 11:10 UTC ( [id://471654]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Perl not BNF-able??
in thread Perl not BNF-able??

such ambiguities ... do not stop us from expressing whichever of the choices perl actually makes in BNF.

Yeah? Try it.

The problem is that in order to resolve the ambiguity (i.e., in order to select the choice perl actually makes, as you put it) you have to have access to information about the surrounding context, information that gets lost when reducing the surrounding syntax to BNF.

s ;; split / , / ;; s ,$, ; , ; print eval;

Don't make me turn this into a self-modifying quine.

Incidentally, a lot of Lisp or Scheme programmers recoil in horror when they find out that Perl's grammar is not context-free. "If it's not possible to determine what a given syntactic construct means independent of context, how could anyone ever keep track of what a piece of a program is doing?" Well, that's the problem BNF has. In practice, this is not generally a problem for programmers who know the language, but it gives overly simple parsers fits.

Replies are listed 'Best First'.
Re^2: Perl not BNF-able??
by anonymized user 468275 (Curate) on Jul 01, 2005 at 11:45 UTC
    The problem with 'Try it' is that it's an all or nothing proposition. For lisp, the BNF is small enough not to worry about that. I can see that you can't wrap up a piece of non-context-free syntax in isolation. However, what I expected was that it might be possible to describe different syntax-elements for each type of contextual variation.

    I had a bad feeling about eval from the start, but I will still need to see where the process breaks down when worked through from first principles before I understand exactly what will go wrong (assuming the context-free issue is insurmountable, that is).

    One world, one people

      A little warning. The yacc parser config in perl (perly.y) is 763 lines long, while the lexer (toke.c) is 10998 lines long. That should be a clue as to how much special-case code perl needs that the parser can't handle itself.

      Dave.

        I was in a conversation the other day where Larry said that the reason he did that was to be able to say that Perl 5 had a smaller parser than did Perl 4. Now that he's refactoring the tokenizer, I think he regrets that choice somewhat.

      For lisp, the BNF is small enough not to worry about that

      Well, yeah, lisp barely even *has* syntax. That's the opposite end of the spectrum.

      I can see that you can't wrap up a piece of non-context-free syntax in isolation. However, what I expected was that it might be possible to describe different syntax-elements for each type of contextual variation.

      Yes, in principle. Perl can be parsed, obviously, because perl does parse it, QED. But the question is, what is required to make that happen? Perl has both left-associative and right-associative operators, for instance (although Perl5 is less whole-hog in this regard than Perl6 will be). Perhaps the larger issue, though, is that there are various things that can change the rules for how to parse other things, and they don't all come right before the thing whose interpretation they change. Just for one little example, consider the /x modifier, which changes the interpretation of whitespace and number signs that occur in the middle of the regex, before it. For a Turing-complete parser, this is not a problem, because you just scan ahead and see what's coming. I'm not a BNF guru, but if it has a concept of lookahead, I'm not aware of it.

      Another problem is that the way BNF likes to handle things is to lump cases together and treat them as the same, but this presents serious problems when parsing Perl code, especially as regards the various quoting constructs, *especially* the quoting constructs related to pattern matching, because of the number of special cases. In the example above, a typical BNF approach would be to lump the allowed regular expression modifiers together (by defining a rule that can match any of them and then using that rule at given points in various larger rules), but as the example above illustrates, you can't do that, because a regular expression modifier can change the rules for parsing the regular expression that precedes it. It also can change the interpretation of what is parsed; /s and /m do that. BNF is not troubled by that wrinkle, since it is not concerned as much with semantics as it is with syntax -- but that is BNF's whole problem when it comes to Perl, because the syntax and the semantics are not fully separable: the parser needs (some) knowledge of the semantics in order to sort out the syntax.


      "In adjectives, with the addition of inflectional endings, a changeable long vowel (Qamets or Tsere) in an open, propretonic syllable will reduce to Vocal Shewa. This type of change occurs when the open, pretonic syllable of the masculine singular adjective becomes propretonic with the addition of inflectional endings."  — Pratico & Van Pelt, BBHG, p68
        Well, you can claim that Perl 6 is going whole hogger on context sensitivity than Perl 5, but the fact is that Perl 6 is cleaning up all those silly post-declarational switches you're carping about, and forcing them to be predeclarations (except for /x, which is going away entirely because it's mandatory). Left-to-right parsing is darn near mandatory in Perl 6. Lookahead dependencies are generally limited to one token. About the only exception I can think of offhand is deciding whether curlies are composing a hash or declaring a closure, and this is handled by thinking of both of them as closures, but then forcing an evaluation-time call on the closure if semantic analysis determines it's a hash composer.

        As for /e, that's also gone in favor of generalized expression interpolation.

        This 471967 is the best explanation so far of why perl defies BNF although some other posts arguably provide better insight into the problems of parsing perl, which of course is also an important motivation for my OP.

        Yes I chose lisp in my comment as a contrary example because its BNF is about the simplest. Here is one version of it:

        http://cui.unige.ch/db-research/Enseignement/analyseinfo/LISP/BNFlisp.html

        One world, one people

      Oh, one other thing...

      I had a bad feeling about eval from the start

      If you think eval is problematic, consider the /e substitution modifier, which may be combined with nonstandard delimiters...

      s ((foo))[{ some_possibly_lengthy_chunk_of_code_here(); somefunc($1) }]ex;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://471654]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (2)
As of 2024-04-19 22:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found