Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Learning about Lexing/Parsing in general form?

by r.joseph (Hermit)
on Dec 06, 2001 at 16:00 UTC ( #129891=perlquestion: print w/replies, xml ) Need Help??

r.joseph has asked for the wisdom of the Perl Monks concerning the following question:

In my never ending search for knowledge (and in my also never ending search for something to relive my boredom), I decided a while back that I wanted to learn how to write a programming language. Well, actually, I really just want to learn the process of lexing and parsing grammer.

However, I have actually found it very hard to find general descriptions of the process online - sure, there is the docs to FLEX and YACC and BISON and whatnot, but I am really looking for more of a generic introduction to the processes as a whole.

I ask this here because, even though not actually a Perl-centric question, I have learned over my time here that PM has one of the best signal-to-noise ratios on the web when it comes to the users actually KNOWING the field in which they are jabbering on about. Anyway, so basically, I know that asking the question here will glean good results - so please, being resulting! :-)

r. j o s e p h
"Violence is a last resort of the incompetent" - Salvor Hardin, Foundation by Issac Asimov
  • Comment on Learning about Lexing/Parsing in general form?

Replies are listed 'Best First'.
Re: Learning about Lexing/Parsing in general form?
by Masem (Monsignor) on Dec 06, 2001 at 17:20 UTC
    There is the ORA book lex & yacc, 2nd ed, which while I've not read through, looks like it has a good bit of intro into the parsing processing. While it was last published in 1992, lex and yacc (and the associated Gnu tools) haven't changed much since then either, so the book's quite valid.

    But as the other poster put it, the only place outside of the docs on the net are books regarding compiler programming, and these are typically limited to textbooks at the college level. And since compile programming is typically a optional CS course, the number of books here are few and far between.

    The other problem is that while using lex/yacc and other tools isn't in themselves a difficult task, developing a usable grammer, and implementing that grammar with those tools *is* one. And no one has really written anything 'good' in terms of a thought process, critical review, or similar, of grammar design as much as there is of OOP, programming style, etc. I would not be surprised to find a better trove of info in older USENET archives (pre1995) than what is on the net today for grammars.

    Dr. Michael K. Neylon - || "You've left the lens cap of your mind on again, Pinky" - The Brain
    "I can see my house from here!"
    It's not what you know, but knowing how to find it if you don't know that's important

      compile programming is typically a optional CS course

      This is really a shame.

      It was compulsary in my program, back in the mid 80's and I am really glad it was: I remember my first thought when I read the description of the class: "what a waste of time, only Mr C and Mss Fortran need to know about this, why on Earth should I care, I use compilers, I don't write them!".

      During the class I still thought it was useless but at least it was fun... unlike some other classes (COBOL anyone?). Then when I started working I soon find myself in the middle of a huge project that essentially consisted in writing 4 compilers (including one with incremental compiling, fun!) and a run time environment for an advanced test bench, then using lex/yacc to generate tests for an other project, then writing an interpreter for an SGML transformation language. Boy was I happy not to have cut that compiler class!

      And I don't even mention the number of time I had to write quick and dirty interpreters for very simple languages (subsets of XPath, DB queries...). In this case I generally use regexps but at least I know what they are good for and where to stop adding features to the language because i would then need to switch to lex/yacc.

      In essence any software processes input that can be defined as a language, so knowing when it makes sense to define this language and how to process it is really a great asset for a programer.

      Plus it's really fun!

Re: Learning about Lexing/Parsing in general form?
by mirod (Canon) on Dec 06, 2001 at 16:40 UTC

    There was an interesting thread about it on the Perl 6 language list.

    I agree with the first comment, the best introductory book I found is certainly the Dragon Book.

      There is also "Writing Compilers in C" by Allen I Holub. It walks you through building a lexer, a parser, and a compiler for a subset of C. Very informative.


      PS: I've also got a book called something like "Creating Compilers For Little Languages". I can't remember who it's by, though. It goes through the process of creating three different little languages, and has the code on a disk.

      PPS: I just found a link for it:
      Constructing Language Processors for Little Languages

Re: Learning about Lexing/Parsing in general form?
by demerphq (Chancellor) on Dec 06, 2001 at 18:55 UTC
    Buy the Red Dragon which can be obtained at fatbrain and PM gets a cut too.

    It is an excellent introduction into grammers lexing, parsing, regexes, finite automata. Hmm, to be honest IMO it is one of the most interesting and generally informative comp sci books I have read. Definately worth the money to buy and read. And yes I know you said online material, but I seriously doubt you will find anything online that is of as high a quality and usefulness as this book.

    Doh! Mirod already suggested this and I didnt even notice. Well, two recommendations for a book this good is probably not such a bad thing.

    Yves / DeMerphq
    This space for rent.

Re: Learning about Lexing/Parsing in general form?
by Fletch (Bishop) on Dec 06, 2001 at 19:19 UTC

    You might also look for Crafting a Compiler With C (ISBN 0805321667). It's out of print, but it was the textbook for the compilers class I took (of course I had Leblanc as the prof for the class, which may have influenced his book choice somewhat :). Or you could wait until September of 2002 when amazon says the second edition should be out (ISBN 0201385937).

Re: Learning about Lexing/Parsing in general form?
by danger (Priest) on Dec 06, 2001 at 20:17 UTC

    The aforementioned Dragon book is the classic compiler book, but since you mention online resources --- one book you might consider is: Parsing Techniques -- A Practical Guide (Dick Grune and Ceriel J.H. Jacobs, 1990) Not a "compiler" book per se, but quite a bit of general and specific parsing theory in a relatively easy to swallow presentation. Published in 1990, it went out of print and the authors regained the copyright and have made it available in postscript or pdf formats at this site

Re: Learning about Lexing/Parsing in general form?
by hsmyers (Canon) on Dec 07, 2001 at 03:37 UTC
    There is a perlish connection to at least two of the best books on parsing and language design in general—though it is in-direct. If you remember the origins of Perl as being a combination of Awk, Sed and the kitchen sink, you might also remember that the 'A' in Awk stands for Aho. As in Alfred V. Aho, one of authors of Compilers: Principles, Techniques, and Tools, by Alfred V. Aho and Ravi Sethi and Jeffrey D. Ullman. Stanford CA., Addison-Wesley Publishing Company, 1985. Also The Theory of Parsing Translation and Compiling, by Alfred V. Aho and Jeffrey D. Ullman. Englewood Cliffs, NJ. Prentice-Hall, 1973.

    The first of course is the 'Dragon' book, already suggested. The second is a two volume treatment that was the best on the subject in the years before the dragon came. Of course on of the problems with being the best is that rank doesn't necessarily mean anything about it being understandable! For that I recommend something a bit more like The Anatomy of a Compiler, by John A. N. Lee. New York, NY. D. Van Nostrand Company, 1974. If you prefer a 'step-by-step' approach with a bit more hands on, then there is A Retargetable C Compiler: Design and Implementation, by Christopher Fraser and David Hanson. Menlo Park, CA., Addison-Wesley Publishing Company, 1995. This last is one of the few published works written as an example of Literate Programming and is worth the look for that alone.


Re: Learning about Lexing/Parsing in general form?
by NicS (Scribe) on Dec 07, 2001 at 15:37 UTC
    The o'rielly lex and yacc book isn't a bad place to start.
    I can only get so much from books though. I find the best place to learn is to look at real code, only then do you start to pick up the tricks and tips that most books tend to miss.

    So go grab some code and start reading, and good luck!



Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://129891]
Approved by root
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2023-12-10 06:52 GMT
Find Nodes?
    Voting Booth?
    What's your preferred 'use VERSION' for new CPAN modules in 2023?

    Results (38 votes). Check out past polls.