Re: Perl not BNF-able??
by dave_the_m (Monsignor) on Jul 01, 2005 at 09:38 UTC
|
Some parts of the Perl grammar are inherently ambiguous, and the lexer resolves them by using dodgy look-ahead heuristics (which don't always work), eg indirect object syntax:
print $f +$g
print $f+$g
is that writing the positive value $g to the filehandle $f, or printing ($f+$g) to STDOUT?
sub newhashref { { a => 1, b => 3 } }
sub innerblock { { local $a => 1; ... } print "old a=$a" }
is an opening brace the start of a block or the start of an anonymous hash generator?
Dave. | [reply] [d/l] [select] |
|
True, but although such ambiguities become apparent to the programmer, they do not stop us from expressing whichever of the choices perl actually makes in BNF.
| [reply] |
|
BNF is by definition for context free grammars. Perl's grammar is not context free.
| [reply] |
|
such ambiguities ... do not stop us from expressing whichever of the choices perl actually makes in BNF.
Yeah? Try it.
The problem is that in order to resolve the ambiguity
(i.e., in order to select the choice perl actually
makes, as you put it) you have to have access to
information about the surrounding context, information
that gets lost when reducing the surrounding syntax
to BNF.
s ;; split / , / ;; s ,$, ; , ; print eval;
Don't make me turn this into a self-modifying quine.
Incidentally, a lot of Lisp or Scheme programmers
recoil in horror when they find out that Perl's
grammar is not context-free. "If it's not possible
to determine what a given syntactic construct means
independent of context, how could anyone ever
keep track of what a piece of a program is doing?"
Well, that's the problem BNF has. In practice,
this is not generally a problem for programmers who
know the language, but it gives overly simple
parsers fits.
| [reply] [d/l] |
|
|
|
|
|
|
Re: Perl not BNF-able??
by domm (Chaplain) on Jul 01, 2005 at 10:26 UTC
|
You may be interested in PPI.
--
#!/usr/bin/perl
for(ref bless{},just'another'perl'hacker){s-:+-$"-g&&print$_.$/}
| [reply] [d/l] |
|
Yes, thank you for that - PPI looks like the best approach so far and PPI::Dumper seems very useful for identifying the context interpretations previous posts show concern about.
| [reply] |
|
But keep in mind that PPI is just guessing, some of the time, using the most likely parsing, rather than an absolute parsing. So, it's still not formal or official, it's merely workable on most programs. But there's no way to know in the code if it's ever done the wrong thing... you simply get a wrong parse and no indication.
| [reply] |
|
Since I wrote PPI, I probably have struggled against more anti-BNF things that most. So here's a few.
1. Source Filters
use Acme::Buffy;
BuFFy BUFfy bufFy Buffy...
You can't handle source filters in BNF for obvious reasons.
2. BEGIN blocks
You can't (code) parse Perl without also executing it.
BEGIN {
die if is_christmas($today);
}
That code is legal Perl every day of the year except for Christmas, when it is not legal Perl code.
3. Cannot be parsed when isolated
use Some 'function';
function 'foo', 'bar';
That code could mean either function('foo'), 'bar' or function( 'foo', 'bar' ) depending on what is in the Some module.
If Some.pm isn't installed on your machine, how are you to know.
PPI does what it does using more evil than Perl itself, and certainly more heuristics. It makes up for it's guesses by promising that what it writes out is ALWAYS character for character identical to what it read in, unless you change something.
So if it has a problem, it won't break as long as you don't change that part of the tree.
There's probably more reasons, but for me those are the big three.
| [reply] [d/l] [select] |
Re: Perl not BNF-able??
by jonadab (Parson) on Jul 01, 2005 at 10:00 UTC
|
Anyone know what aspects of perl should make it impossible to define it in BNF?
Perl's grammar is overall just too general for BNF.
I don't think there's an exhaustive list of all the
linguistic features in Perl that make it too general
for BNF, but I can tell you that it's not just a
couple of things, syntactically. If there's one
overall general thing to blame, it's context. To
parse Perl, you have to keep track of what context
any given bit of code that you're parsing is in.
A given syntactic construct can have a different
semantic meaning (and possibly different syntactic
implications for the surrounding code) if it is
encountered in a different context. BNF doesn't
deal well with that.
FWIW, many overly simplistic syntax highlighting
engines can't handle Perl, either, for roughly
the same reason: Perl is very expressive, and
therefore very complicated to parse. Not as
complicated (or as expressive) as, say, English,
but complicated nonetheless.
"In adjectives, with the addition of inflectional endings, a changeable long vowel (Qamets or Tsere) in an open, propretonic syllable will reduce to Vocal Shewa. This type of change occurs when the open, pretonic syllable of the masculine singular adjective becomes propretonic with the addition of inflectional endings."
— Pratico & Van Pelt, BBHG, p68
| [reply] |
Re: Perl not BNF-able??
by merlyn (Sage) on Jul 01, 2005 at 13:46 UTC
|
Correct. You cannot have a Perl "parser" that analyzes separately from the lexing phase. The lexer needs to know the parse state. See my seminal note on that in On Parsing Perl.
| [reply] |
|
Are you allowed to describe something you wrote yourself as seminal?
I'm not disputing it's seminality you understand, just wondering about your qualifications to describe it thus.
| [reply] |
|
Are you allowed to describe something you wrote yourself as seminal?
If enough other people (or the *right* other people)
describe it that way first, and if you're bringing
it up in a situation where it is clearly relevant
(or someone else brought it up first), then yes.
Not that it doesn't represent a certain amount of
hubris, of course...
(And no, I don't happen to know in this particular
instance who else may or may not have so described
the item in question first. I was just answering
your question in the general manner in which it
was stated.)
| [reply] |
|
|
Well, it's seminal in the sense that he was the first person who could really be bothered to lay out a few of the main reasons in a readable way in a location that could easily be URL-referenced.
I certainly refer people to it on a regular basis.
| [reply] |
|
For something to be seminal it only has to be first, and with a thorough literature review one can claim seminality for one's own work.
On a related note, not pertaining to the article in question; it is interesting to note how often "seminal" is confused with "good". There is great value in firsts, but they are frequently not good paragons of their form.
| [reply] |
Re: Perl not BNF-able??
by tphyahoo (Vicar) on Jul 01, 2005 at 13:39 UTC
|
On that note, I would like to ask a related question.
Will Perl6 be BNF-able?
I seem to recall reading somewhere that eventually all perl6 parsing will be implemented in perl6, using one (monster) perl6 rule.
Is that right or borked?
| [reply] |
|
In a recent conversation with TheDamian, he admitted that Perl6's goal of being able to be lexed and parsed separately was unrealistic. Perl6 will therefore inherit Perl5's necessity of a tight coupling between the lexer and the parser. Oh well.
| [reply] |
|
| [reply] |
|
| [reply] |
Re: Perl not BNF-able??
by educated_foo (Vicar) on Jul 03, 2005 at 09:12 UTC
|
Please see S_intuit_more() in toke.c in the perl source, and remember this is the *lexer* doing this weird "weighing of evidence". | [reply] |
Re: Perl not BNF-able??
by ysth (Canon) on Jul 05, 2005 at 00:21 UTC
|
$ perl -we'sub my_print ($) { print @_ } my_print 0,1'
0
$ perl -we'sub my_print { print @_ } my_print 0,1'
01
| [reply] [d/l] |
Re: Perl not BNF-able??
by anonymized user 468275 (Curate) on Jul 05, 2005 at 14:51 UTC
|
In reply to some responses which pick example code and argue whether or not they can be expressed in BNF, it seems to me that any single example can be always be expressed in BNF.The main idea of my OP was to address the ability of the language as a whole to be expressed in BNF, e.g. to parse a set of project code to extract certain references and actual parameter values.
| [reply] |
Re: Perl not BNF-able??
by Anonymous Monk on Jul 04, 2005 at 13:07 UTC
|
Note that something simple as:
sub hello();
sub hello() {print "world"}
isn't context-free (assuming the programmer is free to pick the subroutine names, and the grammer isn't going to list all possible subroutine names), and hence certainly not BNF-able.
Not even a much simpler (with respect to the language's grammar) language like C is context-free, for the same reason: forward declarations. In general, anything you have a language construct where you're forced to repeat something that wasn't matched by a terminal in the grammar (kind of like using \1 in a regex) your grammar isn't context-free - and hence, BNF will not be powerful enough. | [reply] [d/l] [select] |
|
statement : sub_proto
statement : sub_def
term : anon_sub
sub_proto : ws? "sub" ws? sym_name proto? ws? ";"
sub_def : ws? "sub" ws? sym_name proto? ws? block
anon_sub : ws? "sub" block
sym_name : ident "::" sym_name
sym_name : ident
| [reply] [d/l] |
|
sub hello($);
sub hello() {print("Hello, world");}
But you aren't. The signatures should be the same - and that's unexpressable with BNF. | [reply] [d/l] |
|
|
Yes, the context-free idea isn't a logical necessity for expressing in BNF, because that alone can be conquered by expressing the different context types as alternative BNF definitions.Rather, the most compelling arguments that have been made in reply here come from those perl syntax elements which overpower the nearby context, such as (amongst other examples given by various people) the e-regexp modifier.
| [reply] |
|
Yes, the context-free idea isn't a logical necessity for expressing in BNF, because that alone can be conquered by expressing the different context types as alternative BNF definitions.
How so? Could you give a BNF snippet that would express how to generate a forward declaration of a subroutine that matches the signature of the actual subroutine definition?
Rather, the most compelling arguments that have been made in reply here come from those perl syntax elements which overpower the nearby context, such as (amongst other examples given by various people) the e-regexp modifier.
The /e regexp modifier? In which way does that "overpower" nearby context? Couldn't that construct be parsed with the following BNF snippet:
non-a-slash: # string that doesn't contain a non escaped slash
SUBSTITION: non-eval-substition | eval-substitution
non-eval-substitution:
's' '/' not-a-slash '/' not-a-slash '/' non-e-modifiers
+(*)
eval-substitution:
's' '/' not-a-slash '/' slashless-code '/' e-modifiers(
+*)
non-e-modifiers: 'x' | 's' | 'm' | 'g' | 'x' | 'i'
e-modifiers: 'e' | non-e-modifiers
Assuming you could create BNF for Perl code (and then you can create BNF for Perl code that doesn't contain a slash).
Grammars expressed in BNF can be parsed with recursive decent parsers (which do not have a fixed lookahead) - and if you can create a parser for all other Perl constructs, I can parse the difference between s/// and s///e. | [reply] [d/l] [select] |