e_bachmann has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to parse CCL commands like:
dylan "bob dylan" dylan or zimmerman bob and (dylan or zimmerman)
I need to seach a database for each token and combine the results. Any surgestions?

:-) Erik Bachmann

Examle grammar:

CCL-Find ::= CCL-Find Op Elements | Elements. Op ::= "and" | "or" | "not" -- The above means that Elements are separated by boolean operators. Elements ::= '(' CCL-Find ')' | Set | Terms | Qualifiers Relation Terms | Qualifiers Relation '(' CCL-Find ')' | Qualifiers '=' string '-' string -- Elements is either a recursive definition, a result set reference +, a -- list of terms, qualifiers followed by terms, qualifiers followed -- by a recursive definition or qualifiers in a range (lower - upper +). Set ::= 'set' = string -- Reference to a result set Terms ::= Terms Prox Term | Term -- Proximity of terms. Term ::= Term string | string -- This basically means that a term may include a blank Qualifiers ::= Qualifiers ',' string | string -- Qualifiers is a list of strings separated by comma Relation ::= '=' | '>=' | '<=' | '<>' | '>' | '<' -- Relational operators. This really doesn't follow the ISO8777 -- standard. Prox ::= '%' | '!' -- Proximity operator

Replies are listed 'Best First'.
Re: Parsing CCL (Common Command Language) commands
by mugwumpjism (Hermit) on Oct 17, 2002 at 11:46 UTC

    Left logical grammars such as you have listed are very mechanically converted to `recursive descent' parsers.

    You might want to check out Parse::RecDescent. You should be able to go straight from your grammar to a working parser.

    Otherwise, make yourself a next_token(), an eat_token() function, and to make things easier, a save_position() and restore_position() pair to allow you to easily put already eaten tokens back onto the front of the stream (not necessary with some grammars, see books on recursive descent and LL grammars for more) and write code like this:

    sub expect_Elements { my $stream = shift; my $expression; $p = save_position($stream); if (next_token() eq "(") { eat_token($stream); $expression = [ expect_CCL_Find($stream) ]; # It might make more sense to die() here if (eat_token() ne ")") { $expression = undef; } } else { if ($expression = expect_Set($stream)) { } elsif ($expression = expect_Terms($stream)) { } elsif ($expression = expect_Qualifiers($stream)) { if ($rel = expect_Relation($stream)) { if (next_token($stream) eq "(") { eat_token(); $expression = [ expect_CCL_Find($stream) ]; (eat_token() eq ")") or $expression = undef; } elsif ( my $terms = expect_Terms($stream) ) { $expression = [ $rel, $expression, $terms ]; } else { $expression = undef; } } elsif (next_token($stream) eq "=") { eat_token(); $string1 = expect_string($stream); $op = next_token($stream); $string2 = expect_string($stream); if (!defined $string1 or !defined $string2 or !defined $op or $op ne "-") { $expression = undef; } else { $expression = [ "=", $expression, $string1, "-", $ +string2 ]; } } else { $expression = undef; } } else { $expression = undef; } } if (defined $expression) { return $expression; } else { restore_position($stream, $pointer); return undef; } }

    I think it should be fairly obvious why you'd want a module (or flex/yacc) to write this for you for larger grammers, but every programmer should write at least one recursive descent parser :-).

Re: Parsing CCL (Common Command Language) commands
by perlcgi (Hermit) on Oct 17, 2002 at 12:01 UTC
    The best approach IMHO is not to reinvent the wheel. As CCL is a standard, (ANSI/NISO Z39.58-1992), it is hard to believe that code is not already available somewhere.
    That said, the many implementations in library catalogs for example, are commercially sensitive and you may well have to roll your own.
    On the positive side, much of the work has been done for you, given that you have the grammar. I would check out Damian Conway's module Parse::RecDescent. The following are useful starting sources:
    Data Munging with Perl by davorg - I found this treatment of Parse::RecDescent particularly easy to digest
    The FAQ
    Damian Conway's TPJ article
    Best of luck,

      Just keep in mind that Parse::RecDescent is painfully slow for anything beyond a toy. It's not for nothing that Damian Conway is planning on writing a Parse::FastDescent.

      In the meantime, you'll get good mileage out of Parse::Yapp, which basically redoes yacc(1) in Perl.

      print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u'
        Thank you for alle the surgestions!

        print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u'
        This REALLY says all about my relations to parsers! :-(

        I've gone for the YAPP and taken the Calp.yp example and replaced the grammer with the CCL grammer (leaving out the "Set" statement for now). Compiled it using:

        yapp -v myParser.yp
        with one message:
        1 shift/reduce conflict
        Then I've written a test:
        use MyParser; $parser=new MyParser(); $value=$parser->YYParse(yylex => \&lexer_sub, yyerror => \&error_sub +); $nberr=$parser->YYNberr(); $parser->YYData->{DATA}= [ 'a and ( b or c)' ]; $data=$parser->YYData->{DATA}[0];
        All this gives is a:
        Syntax error. Can't locate object method "new" via package "MyParser" (perhaps you f +orgot to l oad "MyParser"?) at line 2, <STDIN> line 1.
        • How do I really activate this parser?
        • How do I add my functions to process the searching?
        "If there are two or more ways to do something and one of them can lead to a disaster, then one or more persons will do it"
        Edgar A. Murphy, Captain in U.S. Air Force (1949)
Re: Parsing CCL (Common Command Language) commands
by JaWi (Hermit) on Oct 17, 2002 at 11:57 UTC
    Well, you could use Yapp or YALALR to create a LALR parser for you according to a specific (Yacc like) grammar. Since yours is already clearly defined in BNF there shouldn't be much trouble converting it.

    Since you didn't specify a particular database, I assume you're free to choose. In that case I would suggest MySQL, since it has this nice RegExp feature, where you can query according to some (simplified) regular expression.
    This would allow you to convert your parse tree into a valid (MySQL) regexp. Simply select everything which looks like your regexp and your finished.

    Hope it helps (a bit),

    -- JaWi

    "A chicken is an egg's way of producing more eggs."