Clear questions and runnable code get the best and fastest answer |
|
PerlMonks |
Object Query Languages and Parsersby TedYoung (Deacon) |
on Apr 21, 2006 at 20:10 UTC ( [id://544977]=perlquestion: print w/replies, xml ) | Need Help?? |
TedYoung has asked for the wisdom of the Perl Monks concerning the following question: Good Day Fellow Monks, I am considering writing a parser and wanted some advice. Before I propose my main questions, I want to avoid a potential XY Problem and tell you what I need to accomplish. For the last five years, I have developed and maintained a Perl object persistence layer. It is one of those APIs that let you define fields in a Perl object and it takes care of all the database stuff. I also developed a query language for it. The language is quite powerful with a minimum amount of syntax. So, all this is working great for us. However, I want to extend the query language (QL) and it current implementation doesn't lend itself well to that. :-) First, I started looking for existing Object Query Language standards and/or implementations that, perhaps, I could use instead of remaking my own. Well, this area of the industry is still pretty underdeveloped. I only found a few standards and they had rather limited functionality in contrast to our current implementation (i.e. JDO and J2EE's CMP]). So, my first question, does anyone know of any good language standards or (better) actual implementations that I should look at? Back to enhancing our current QL: the current QL compiler basically runs regexes against a QL string, looking for field navigation instances (e.x. "department.managers.firstName) and replacing them with corresponding database fields and joining in related tables where necessary. The regexes work fine, but the current code is not very maintainable. Either way, I am interested in doing a re-write of the compiler before I start adding the additional functionality. So, I am looking hard at this and thinking; "Do I want to continue to use plain regexes, or completely parse the QL?" If I am going to parse, what API/technique should I use? My prioritized requirements are:
This is when I bought Higher Order Perl for some inspiration. I like the lexing techniques proposed in the book. However, a lot of the stream oriented stuff is overkill here since I am dealing with small strings. When it comes to parsing, the book implements a Left Hand Recursive Descent Parser. This book is a lot of fun BTW! When I saw that I thought of Parse::RecDescent, which would work wonders. But, I have heard many times that Parse::RecDescent is slow. Has anyone used Parse::RecDescent in a CGI context to compile an average of 3 short strings per request? If so, how was performance? I have started on an initial re-implantation. I lex up the strings nicely; there is no problem here. As I am parsing the tokens and converting them to SQL, I am finding myself writing a lot of recursion. So, this is why I ask if Parse::RecDescent is slow because of features, or because of the recursive implementation. I also checked out Parse::Yapp. This is basically a Perl version of yacc. I thought that there may be a performance benefit by generating a parser as opposed to having the API re-interpret a grammar definition each time. However, the docs suggest that the closest Parse::Yapp gets to making a stand-alone parser is it copies the interpreter code into the generated module along with the grammar definition. So, on that ground it is equal to Parse::RecDescent. Anyone have any experience with the performance of either of these? Normally benchmarking would be a simple answer to all of this. But, because writing a parser it a non-trivial task, it would be very difficult to implement different parsers in Parse::RecDescent, Parse::Yapp and RegEx to test them. I am hoping that our community can share with me their ideas so that I can make a more informed descision. Thanks for reading this far! update:dragonchild asked to see an example of the QL. Here are a few examples of some of the supported syntax. They should be pretty self-explanitory.
Ted Young ($$<<$$=>$$<=>$$<=$$>>$$) always returns 1. :-)
Back to
Seekers of Perl Wisdom
|
|