http://qs321.pair.com?node_id=471598

anonymized user 468275 has asked for the wisdom of the Perl Monks concerning the following question:

(extract from perlfaq7): 'In the words of Chaim Frenkel: "Perl's grammar can not be reduced to BNF. The work of parsing perl is distributed between yacc, the lexer, smoke and mirrors."'

I write a lot of parsing tools for client development projects so the need could arise soon. Anyone know what aspects of perl should make it impossible to define it in BNF?

One world, one people

Replies are listed 'Best First'.
Re: Perl not BNF-able??
by dave_the_m (Monsignor) on Jul 01, 2005 at 09:38 UTC
    Some parts of the Perl grammar are inherently ambiguous, and the lexer resolves them by using dodgy look-ahead heuristics (which don't always work), eg indirect object syntax:
    print $f +$g print $f+$g
    is that writing the positive value $g to the filehandle $f, or printing ($f+$g) to STDOUT?

    sub newhashref { { a => 1, b => 3 } } sub innerblock { { local $a => 1; ... } print "old a=$a" }
    is an opening brace the start of a block or the start of an anonymous hash generator?

    Dave.

      True, but although such ambiguities become apparent to the programmer, they do not stop us from expressing whichever of the choices perl actually makes in BNF.

      One world, one people

        BNF is by definition for context free grammars. Perl's grammar is not context free.
        such ambiguities ... do not stop us from expressing whichever of the choices perl actually makes in BNF.

        Yeah? Try it.

        The problem is that in order to resolve the ambiguity (i.e., in order to select the choice perl actually makes, as you put it) you have to have access to information about the surrounding context, information that gets lost when reducing the surrounding syntax to BNF.

        s ;; split / , / ;; s ,$, ; , ; print eval;

        Don't make me turn this into a self-modifying quine.

        Incidentally, a lot of Lisp or Scheme programmers recoil in horror when they find out that Perl's grammar is not context-free. "If it's not possible to determine what a given syntactic construct means independent of context, how could anyone ever keep track of what a piece of a program is doing?" Well, that's the problem BNF has. In practice, this is not generally a problem for programmers who know the language, but it gives overly simple parsers fits.

Re: Perl not BNF-able??
by domm (Chaplain) on Jul 01, 2005 at 10:26 UTC
    You may be interested in PPI.
    -- #!/usr/bin/perl for(ref bless{},just'another'perl'hacker){s-:+-$"-g&&print$_.$/}
      Yes, thank you for that - PPI looks like the best approach so far and PPI::Dumper seems very useful for identifying the context interpretations previous posts show concern about.

      One world, one people

        But keep in mind that PPI is just guessing, some of the time, using the most likely parsing, rather than an absolute parsing. So, it's still not formal or official, it's merely workable on most programs. But there's no way to know in the code if it's ever done the wrong thing... you simply get a wrong parse and no indication.

        -- Randal L. Schwartz, Perl hacker
        Be sure to read my standard disclaimer if this is a reply.

        Since I wrote PPI, I probably have struggled against more anti-BNF things that most. So here's a few.

        1. Source Filters
        use Acme::Buffy; BuFFy BUFfy bufFy Buffy...
        You can't handle source filters in BNF for obvious reasons.

        2. BEGIN blocks

        You can't (code) parse Perl without also executing it.
        BEGIN { die if is_christmas($today); }
        That code is legal Perl every day of the year except for Christmas, when it is not legal Perl code.

        3. Cannot be parsed when isolated
        use Some 'function'; function 'foo', 'bar';
        That code could mean either function('foo'), 'bar' or function( 'foo', 'bar' ) depending on what is in the Some module.

        If Some.pm isn't installed on your machine, how are you to know.

        PPI does what it does using more evil than Perl itself, and certainly more heuristics. It makes up for it's guesses by promising that what it writes out is ALWAYS character for character identical to what it read in, unless you change something.

        So if it has a problem, it won't break as long as you don't change that part of the tree.

        There's probably more reasons, but for me those are the big three.
Re: Perl not BNF-able??
by jonadab (Parson) on Jul 01, 2005 at 10:00 UTC
    Anyone know what aspects of perl should make it impossible to define it in BNF?

    Perl's grammar is overall just too general for BNF. I don't think there's an exhaustive list of all the linguistic features in Perl that make it too general for BNF, but I can tell you that it's not just a couple of things, syntactically. If there's one overall general thing to blame, it's context. To parse Perl, you have to keep track of what context any given bit of code that you're parsing is in. A given syntactic construct can have a different semantic meaning (and possibly different syntactic implications for the surrounding code) if it is encountered in a different context. BNF doesn't deal well with that.

    FWIW, many overly simplistic syntax highlighting engines can't handle Perl, either, for roughly the same reason: Perl is very expressive, and therefore very complicated to parse. Not as complicated (or as expressive) as, say, English, but complicated nonetheless.


    "In adjectives, with the addition of inflectional endings, a changeable long vowel (Qamets or Tsere) in an open, propretonic syllable will reduce to Vocal Shewa. This type of change occurs when the open, pretonic syllable of the masculine singular adjective becomes propretonic with the addition of inflectional endings."  — Pratico & Van Pelt, BBHG, p68
Re: Perl not BNF-able??
by merlyn (Sage) on Jul 01, 2005 at 13:46 UTC
      Are you allowed to describe something you wrote yourself as seminal? I'm not disputing it's seminality you understand, just wondering about your qualifications to describe it thus.
        Are you allowed to describe something you wrote yourself as seminal?

        If enough other people (or the *right* other people) describe it that way first, and if you're bringing it up in a situation where it is clearly relevant (or someone else brought it up first), then yes. Not that it doesn't represent a certain amount of hubris, of course...

        (And no, I don't happen to know in this particular instance who else may or may not have so described the item in question first. I was just answering your question in the general manner in which it was stated.)

        Well, it's seminal in the sense that he was the first person who could really be bothered to lay out a few of the main reasons in a readable way in a location that could easily be URL-referenced.

        I certainly refer people to it on a regular basis.

        For something to be seminal it only has to be first, and with a thorough literature review one can claim seminality for one's own work.

        On a related note, not pertaining to the article in question; it is interesting to note how often "seminal" is confused with "good". There is great value in firsts, but they are frequently not good paragons of their form.

Re: Perl not BNF-able??
by tphyahoo (Vicar) on Jul 01, 2005 at 13:39 UTC
    On that note, I would like to ask a related question.

    Will Perl6 be BNF-able?

    I seem to recall reading somewhere that eventually all perl6 parsing will be implemented in perl6, using one (monster) perl6 rule.

    Is that right or borked?

      In a recent conversation with TheDamian, he admitted that Perl6's goal of being able to be lexed and parsed separately was unrealistic. Perl6 will therefore inherit Perl5's necessity of a tight coupling between the lexer and the parser. Oh well.

      -- Randal L. Schwartz, Perl hacker
      Be sure to read my standard disclaimer if this is a reply.

        Perl6 will therefore inherit Perl5's necessity of a tight coupling between the lexer and the parser. Oh well.

        We do at least have an explicit parser/lexer system to play with in Perl 6, which should enable us to more easily write Perl programs that parse Perl.

      Will Perl6 be BNF-able?

      Not on your life. In Perl6 you can rewrite the very grammar of the language ("All's fair if you predeclare"), and there's absolutely no way BNF can handle that.


      "In adjectives, with the addition of inflectional endings, a changeable long vowel (Qamets or Tsere) in an open, propretonic syllable will reduce to Vocal Shewa. This type of change occurs when the open, pretonic syllable of the masculine singular adjective becomes propretonic with the addition of inflectional endings."  — Pratico & Van Pelt, BBHG, p68
Re: Perl not BNF-able??
by educated_foo (Vicar) on Jul 03, 2005 at 09:12 UTC
    Please see S_intuit_more() in toke.c in the perl source, and remember this is the *lexer* doing this weird "weighing of evidence".
Re: Perl not BNF-able??
by ysth (Canon) on Jul 05, 2005 at 00:21 UTC
    Here's one problem:
    $ perl -we'sub my_print ($) { print @_ } my_print 0,1' 0 $ perl -we'sub my_print { print @_ } my_print 0,1' 01
Re: Perl not BNF-able??
by anonymized user 468275 (Curate) on Jul 05, 2005 at 14:51 UTC
    In reply to some responses which pick example code and argue whether or not they can be expressed in BNF, it seems to me that any single example can be always be expressed in BNF.

    The main idea of my OP was to address the ability of the language as a whole to be expressed in BNF, e.g. to parse a set of project code to extract certain references and actual parameter values.

    One world, one people

Re: Perl not BNF-able??
by Anonymous Monk on Jul 04, 2005 at 13:07 UTC
    Note that something simple as:
    sub hello(); sub hello() {print "world"}
    isn't context-free (assuming the programmer is free to pick the subroutine names, and the grammer isn't going to list all possible subroutine names), and hence certainly not BNF-able.

    Not even a much simpler (with respect to the language's grammar) language like C is context-free, for the same reason: forward declarations. In general, anything you have a language construct where you're forced to repeat something that wasn't matched by a terminal in the grammar (kind of like using \1 in a regex) your grammar isn't context-free - and hence, BNF will not be powerful enough.

      Why do you say that snippet is not BNF-able? Parts of Perl cannot be expressed by BNF, but your snippet is not one of them as far as I can tell.

      statement : sub_proto statement : sub_def term : anon_sub sub_proto : ws? "sub" ws? sym_name proto? ws? ";" sub_def : ws? "sub" ws? sym_name proto? ws? block anon_sub : ws? "sub" block sym_name : ident "::" sym_name sym_name : ident
        So, where's the requirement that both the proto and the definition have the same signature? It's trivial to write BNF for a proto, it's trivial BNF for the definition (given the BNF for the block). It's impossible to use BNF for the requirement that the signatures (that is, name and prototype) are the same.

        Your BNF is ok if you're allowed to write:

        sub hello($); sub hello() {print("Hello, world");}
        But you aren't. The signatures should be the same - and that's unexpressable with BNF.
      Yes, the context-free idea isn't a logical necessity for expressing in BNF, because that alone can be conquered by expressing the different context types as alternative BNF definitions.

      Rather, the most compelling arguments that have been made in reply here come from those perl syntax elements which overpower the nearby context, such as (amongst other examples given by various people) the e-regexp modifier.

      One world, one people

        Yes, the context-free idea isn't a logical necessity for expressing in BNF, because that alone can be conquered by expressing the different context types as alternative BNF definitions.
        How so? Could you give a BNF snippet that would express how to generate a forward declaration of a subroutine that matches the signature of the actual subroutine definition?
        Rather, the most compelling arguments that have been made in reply here come from those perl syntax elements which overpower the nearby context, such as (amongst other examples given by various people) the e-regexp modifier.
        The /e regexp modifier? In which way does that "overpower" nearby context? Couldn't that construct be parsed with the following BNF snippet:
        non-a-slash: # string that doesn't contain a non escaped slash SUBSTITION: non-eval-substition | eval-substitution non-eval-substitution: 's' '/' not-a-slash '/' not-a-slash '/' non-e-modifiers +(*) eval-substitution: 's' '/' not-a-slash '/' slashless-code '/' e-modifiers( +*) non-e-modifiers: 'x' | 's' | 'm' | 'g' | 'x' | 'i' e-modifiers: 'e' | non-e-modifiers
        Assuming you could create BNF for Perl code (and then you can create BNF for Perl code that doesn't contain a slash).

        Grammars expressed in BNF can be parsed with recursive decent parsers (which do not have a fixed lookahead) - and if you can create a parser for all other Perl constructs, I can parse the difference between s/// and s///e.