Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Parse syntactically analyzed sentence

by nido203 (Novice)
on May 07, 2016 at 22:09 UTC ( [id://1162449]=perlquestion: print w/replies, xml ) Need Help??

nido203 has asked for the wisdom of the Perl Monks concerning the following question:

Hello everybody, I'm new to this so excuse me if i wrote something wrong. My task is to make grammar for this syntactically analyzed sentence: (SBARQ (WHNP (WP What))(SQ (VBZ is)(NP (NNP Head)(NNP Start)))(. ?)). I've made something that works but the problem is that it works just by the middle of the sentence, it parses just till the word "What". If somebody knows how can i modify this regular expressions, it would really help. Thank you for your time! this is my code

use Parse::RecDescent; use Data::Dumper; $::RD_AUTOACTION = q { [@item] }; my $grammar= q { start :seq seq : '('TAG (seq | word) | ("." "?")')' TAG : /[A-Z]+/ word : m(\w*) }; my $parser=Parse::RecDescent->new($grammar); my $result = $parser->start("(SBARQ (WHNP (WP What))(SQ (VBZ is)(NP (N +NP Head)(NNP Start)))(. ?))"); print Dumper($result);

Replies are listed 'Best First'.
Re: Parse syntactically analyzed sentence
by graff (Chancellor) on May 08, 2016 at 01:15 UTC
    It has been a very long time since I last did anything with Parse::RecDescent, so it took me longer than I'd like to admit to come up with the grammar that works. (It's especially humbling for me in this case, because I recognize, and have often played with, the sort of data you've got here: Penn Treebank.)

    So here's a grammar that does what you seem to want:

    start: tree tree: '(' treestr(s) ')' treestr: tree | tagstr tagstr: TAG ( tree | word ) TAG: /[A-Z.]+ / word: /[\w?]+/
    Note the "(s)" modifier on the first mention of the "treestr" rule -- the start contains one tree (one set of parens will bound the entire string), but within that one tree you can find one or more subtrees. The OP grammar stopped at the end of the first subtree because it couldn't handle the sister tree that followed it.

    There's probably something I'm not understanding just now about using parens (for grouping) and vertical bars (for alternations) in the grammar spec, and it's likely that there are other (less cumbersome) ways to define the grammar for data of this type.

    Anyway, the grammar above does work its way to the end of your test string (though perhaps you want a different sort of data structure as the result, in which case, I apologize -- good luck with that).

    I also noticed from the P::RD man page that you can pass a reference to a scalar containing the string to be parsed. Portions of the string will be removed as the parser works through it, so if you get back less of a structure than you expect, you can look at the string to see where the parsing stopped (due to failure to match any rules). Here's my version of your code:

    #!/usr/bin/perl use strict; use warnings; use Parse::RecDescent; use Data::Dumper; $::RD_AUTOACTION = q { [@item] }; # $::RD_HINT = 1; my $grammar= q { start: tree tree: '(' treestr(s) ')' treestr: tree | tagstr tagstr: TAG ( tree | word ) TAG: /[A-Z.]+ / word: /[\w?!.]+/ }; my $parser=Parse::RecDescent->new($grammar); my $text = "(SBARQ (WHNP (WP What))(SQ (VBZ is)(NP (NNP Head)(NNP Star +t)))(. ?))"; my $result = $parser->start( \$text ); print $text, "\n"; print Dumper($result);
    (UPDATE: I have the "HINT" setting commented out because it wasn't all that helpful.)

    Another update: you probably would have figured this out, but the ". ?" string really should be treated as a "TAG word" pair, which is what my version of the grammar does. The "." is a generic "TAG" label for (strings of?) punctuation, and the "?" in this case represents the actual token that occurred in the text. Other sentences, ending with other punctuation marks, would have ". ." or ". !", etc. The rule for TAG also absorbs the space that must follow the TAG token.

    Added "!." to the rule for "word" - might need to add more punctuation once you start getting into more varied sentences.

      Wow, awesome! This is exactly what I was looking for. Thank you sir very much. Yes it came to my mind that ".?" should be treated as a "TAG word" but couldn't make it work somehow.

Re: Parse syntactically analyzed sentence
by Anonymous Monk on May 08, 2016 at 01:27 UTC

    my debuggery grammar first, "final" grammar second

    basically, name every part, keep it under six "words", stuff that repeats get a quantifier, an (s), stuff that is optional gets a quantifier of (?)

      Thank you Anonymous Monk. This is very helpful and it will help me a lot in future work.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1162449]
Approved by graff
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (4)
As of 2024-04-25 16:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found