Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

original definition vs final language

by dystrophy (Monk)
on Oct 25, 2002 at 05:37 UTC ( [id://207916]=perlmeditation: print w/replies, xml ) Need Help??

This evening in my Theory of Languages and Computability class we were talking about context free languages (CFL) and doing some proofs using the pumping lemma. One of the given examples (by my brilliant math professor who knows nothing of real programming languages) was Pascal. According to the definition and original documentation of the language, it can not be a CFL. C and C++ are supposedly like this as well.

So... what about Perl? Just from what I know (Perl is such a practical and down to earth language) it probably rarely encountered this during it's creation/evolution. I know precious little of it's roots, but I'm very interested. I'm not exactly looking for a reference to the original documentation, but rather people's opinions or a quote here and there.
- dystrophy -

Replies are listed 'Best First'.
Re: original definition vs final language
by Zaxo (Archbishop) on Oct 25, 2002 at 07:09 UTC

    Perl is by no means context-free. Many Perl constructs depend on the surrounding elements of the language to inform them what to produce. The most obvious example is the dual behavior of arrays in list vs. scalar context.

    Perl6 is apparently planned to have a formal top-down definition, but Perl5 and its ancestors evolved through natural selection in a sequence of visions, wishes, experiments and mistakes. The result is astonishingly robust and handy.

    That said, Perl5 does have a grammar. You may be interested to look at perly.y and toke.c in the perl source tree.

    After Compline,
    Zaxo

      The most obvious example is the dual behavior of arrays in list vs. scalar context.

      Note that this isnt normally what is considered to be context when discussing grammars. But its right nontheless. And a good example. ++zaxo

      --- demerphq
      my friends call me, usually because I'm late....

      I don't want to sound pedantic, but the very fact that a thing such as perly.y exists actually proves that Perl is a CFL. Tools such as YACC, Bison and the like are used to create parsers for context free languages. This means that Perl's grammar defines a CFL.

      The fact that something depends on what's next to it doesn't imply that it is context sensitive in the Chomsky hierarchy, it all depends on how one can write the grammar.

      In general you don't want a programming language to be context sensitive since parsers for CSLs are much less efficient than those for CFLs.

      Don't get me wrong, both Perl and Pascal (as well as about any programming language you can think of) are Turing ocmplete, i.e. one can compute any computable problem in each of those languages (if the Church-Turing thesis holds).

      Hope this helps, -gjb-

      Update: Notice I was talking about grammer, semantics is specified in some programming language, and hence is specified recursively enumerable.

        I don't want to sound pedantic, but the very fact that a thing such as perly.y exists actually proves that Perl is a CFL. Tools such as YACC, Bison and the like are used to create parsers for context free languages. This means that Perl's grammar defines a CFL.

        With all due respect this is utterly and completely wrong. Please review Red Dragon for a thorough explanation.

        A CFL/CFG cannot express the requirement that a variable must be declared before it is used. Furthermore a CFL/CFG cannot express the issue of arbitrarily nested constructs. All of these can be definitively proved using proper regular expressions (not perls irregular expressions) which are actually context free.

        A YACC BISON grammer express the context free components of a language specification. The symbol table manipulation and extraneous such code that are attached to the various grammatical rules handle the non context free part of the language.

        The fact is that most languages are primarily context free, but that they almost always have some aspect of context involved. Tools like Bison / YACC are used to simplify (and optimise) the process of constructing the context free part of the grammar. (And to be honest hand constucting LR parsers is very very difficult. LR parsers are preffered because they dont suffer from the problem that most recursive descent parsers do, that of not being able to handle left recursion.)

        To the OP: There are very few useful GPPL languages (general purpose programming langauges) that are truly context free. In fact to my knowledge there are none. But I expect that im wrong on that. Nevertheless ANY language that uses arbitrarily nested parenthesis (as in to any level of nesting), or that requires that identifiers are declared before being used (both of which Perl requires) is NOT context free. Thus all of the following are not context free: Perl, C, C++, Java, Turing, Ada, Pascal, Fortran, Cobol, Modula-2...

        UPDATED:As thraxil and thelenm corrected me, the struck out text is wrong. I got carried away and mixed up proper regular expressions and context free grammars. Thanks and sorry.

        --- demerphq
        my friends call me, usually because I'm late....
        I think I need a vacation

Re: original definition vs final language
by blokhead (Monsignor) on Oct 25, 2002 at 13:29 UTC
    I agree with demerphq's argument about programming languages generally not being context-free. The loose structure and syntax are all roughly context-free, but the example of requiring variables to be predefined before their use shows that any CFG wouldn't work. There is a difference between syntactically well-formed inputs to the parser/compiler and programs that actually run/compile.

    Now I have a more general question for the theoriticians: If a programming language were indeed a CFL, would it have the power to be as expressive as a Turing machine? In other words, can a programming language that is represented completely via a CFG be able to express all computable programs?

    This dilemma is a little beyond my expertise in the area. At first glance seems as though the specification for all possible Turing machines could not be expressed with a CFG -- there must be all-but-infinitely-many states, and the transition function must be well-defined. This doesn't seem possible in a CFG, due to the same type of problem with predefining variables, as demerphq correctly points out above. I'm actually proctoring a test this morning in our Theory of Computing class, so I will ask the prof what he thinks and get back to you on it ;)

    Anyway, great question dystrophy, and enjoy taking Theory of Computation. Every programmer can use a good dose of theory. Just remember to be nice to your theory TA's, OK? ;)

    Update: Well, the good professor seems to recall a Princeton research project that was a compiler that would compile and execute any input at all, including garbage. So there's a CFG-definable program specification which was Turing-expressible. He also thought the set of all Algol 66 or 68 programs might have been context-free. So it would seem that the set of all programs of a language need not be context-sensitive.

    blokhead

Re: original definition vs final language
by Lexicon (Chaplain) on Oct 28, 2002 at 23:36 UTC
    The class you're taking, known as Automata Theory at my University, is a definate precursor to the Compilers class.

    We wrote a Java compiler (in Java, ha ha). We broke the problem into 4 steps: lexical analysis - turning identifies, parens, +-*/, etc... into 'tokens', syntactic analysis - turning tokens into phrases, semantic analysis - matching referents in the symbol table (making all the identifiers point to the same thing), and code generation - writing java byte code.

    It turns out that (for Java, and I think most other languages) lex is context-free, while syn requires a turring machine. Perl is kinda weird though. I imagine it's context free at the lex level, but I'm not sure. We put a lot of things into our languages to keep them CF (mostly keywords, ; statement delimiters, and {} block delimiters) and Perl has no shortage of it's own: Funny Characters $@%&. I recall that 1 token lookahead in the syn phase is equivalent to N token lookahead (where N is a fixed constant), so as long as Perl's oddities (like disambiguating that $x is a scalar and $x['bob'] is an array element) cannot be arbitrarily nested beyond what a simple stack can handle.

    OK, enough Perl meditation.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://207916]
Approved by rob_au
Front-paged by hsmyers
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (7)
As of 2024-04-18 06:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found