Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Clunky parsing problem, looking for simple solution

by Anonymous Monk
on Jun 17, 2002 at 05:14 UTC ( [id://175025]=note: print w/replies, xml ) Need Help??


in reply to Clunky parsing problem, looking for simple solution

*looks dumbfounded*

Erm. The art of language design hasn't been around for 25+ years without having come up with solutions to this kind of thing. I don't want to sound negative but did you do any research into language design before embarking on this project?

I completely agree with the concept of Doing for Oneself before simply grabbing other peoples solutions when doing your own projects, however usually at the point where you start needing to ask for help, it is usually best to go read up on general industry practice first.

In this case, the very first thing you should have is a grammar based parser. As mentioned above, grammars are extremely effective at solving this particular type of problem. Indeed perl5 has a module called Parse::RecDecent or something which contains some excellent documentation on how to develop a proper grammar using it, and it plugs right into perl beautifully.

The alternative course of action is, of course, to simply translate the basic language into valid perl and make perl do the hard work. I'm not sure why you chose not to follow this path but I assume you have your reasons.

Anyway, my recommendation would be to consider re-writing the interpreter using RecDecent. Once the initial learning curve is over you should find it almost trivial to implement the language. I understand that you said "Don't worry about the structure of *my* program" but the essence of a good solution is understanding where the problem lies, and it lies in your program structure :)

Good luck!

Replies are listed 'Best First'.
Parser
by Anonymous Monk on Jun 17, 2002 at 11:15 UTC
    My apologies, I didn't have time to clean it up, I was in a hurry:

    Missing all of the functions
    Missing some operators
    Missing post-processor that puts the lines back together

    its essentially a recursive decent parser for basic, which does a translation to write out an if-then-else construct as a series of if-then and gotos using labels. It prefixes all the lines of the result with the original line number, with the aim of helping a post-processor then go through and correct the original GOTO lines and labels to point to the correct places.

    it isn't well tested, but it will handle nested if/then/elses correctly. It could be expanded to handle compound statements with one line of code (mind you, in perl, thats not saying much).

    The penalty is that it has to know everything about the language. Its basically half way to a basic compiler for Parrot once you add the functions and ops.

    #!/usr/bin/perl use Data::Dumper; sub generic_node { my ($re, $tree) = @_; $current =~ s/^\s*//; if ($current !~ /$re/i) { return 0; } $current =~ s/$re//i; push @{$tree}, $1; return 1; }; sub generic_node_create { my ($re, $tree, $name) = @_; $current =~ s/^\s*//; if ($current !~ /$re/i) { return 0; } $current =~ s/$re//i; push @{$tree}, [$name, $1]; return 1; }; sub generic_check { my ($re, $tree) = @_; $current =~ s/^\s*//; return $current =~ s/$re//i; }; sub operator { generic_node_create('^([+-=*\/])',shift,"OPERATOR"); }; sub number { generic_node_create('^(\d+)',shift,"NUMBER"); }; sub string { generic_node_create('^\"([^\"]+)\"',shift,"STRING"); }; sub numeric_variable { generic_node_create('^([A-Z]\d)',shift,"NUMERIC_VAR"); }; sub string_variable { generic_node_create('^([A-Z]\$)',shift,"STRING_VAR"); }; sub equals { generic_node_create('^([=])',shift,"EQUALS"); }; sub expression { my $tree = shift; my $t = [EXPRESSION]; string_variable($t) || numeric_variable($t) || string($t) || number($t) || (generic_check('^[\(]',$t) && expression($t) && generic_check( +'^[\)]',$t)) || return 0; if (operator($t)) { expression($t) || return 0; } push @{$tree},$t; return 1; }; sub assignment { my $tree = shift; my $t = [ASSIGNMENT]; numeric_variable($t) || string_variable($t) || return 0; equals($t) || return 0; expression($t) || return 0; push @{$tree},$t; return 1; }; sub comment { if (generic_check('^REM')) { return 1; } }; sub if_condition { my $tree = shift; my $t = [IF_CONDITION]; generic_check('^IF',$t) || return 0; expression($t) || die "Invalid IF statement, expected expressi +on"; generic_check('^THEN',$t) || die "Invalid IF statement, expect +ed THEN"; statement($t) || die "Invalid IF construct, expected statement + after THEN"; if (generic_check('^ELSE',$t)) { statement($t) || die "Invalid IF/ELSE construct, expec +ted statement after ELSE"; } push @{$tree},$t; return 1; }; sub goto_statement { my $tree = shift; my $t = [GOTO]; generic_check('^GOTO',$t) || return 0; number($t) || die 'GOTO must be followed by a number'; push @{$tree},$t; }; sub statement { my $tree = shift; my $t = [STATEMENT]; assignment($t) || comment($t) || if_condition($t) || goto_statement($t) || return 0; push @{$tree},$t; }; # # Pretty-print routines # sub print_expression { my ($subelement) = @_; if ($subelement->[0] eq "EXPRESSION") { if ($subelement->[2][0] ne "OPERATOR") { return "(".print_expression($subelement->[1]). +")"; } else { return "(".print_expression($subelement->[1]). +" ".$subelement->[2][1]." ".prin t_expression($subelement->[3]).")"; } } if ($subelement->[0] eq "NUMBER") { return $subelement->[1]; } if ($subelement->[0] eq "STRING") { return '"'.$subelement->[1].'"'; } if ($subelement->[0] eq "STRING_VAR") { return $subelement->[1]; } if ($subelement->[0] eq "NUMBER_VAR") { return $subelement->[1]; } }; sub print_statement { my ($statement) = @_; my $s = ""; $level++; my $subelement = $statement->[1]; if ($subelement->[0] eq "IF_CONDITION") { add_line("IF ".print_expression($subelement->[1])." TH +EN GOTO THEN_$level"); if ($subelement->[3]) { print_statement($subelement->[3]); } add_line("GOTO END_$level"); add_line("THEN_$level:"); print_statement($subelement->[2]); + add_line("END_$level:"); } if ($subelement->[0] eq "GOTO") { add_line("GOTO ".$subelement->[1][1]); } if ($subelement->[0] eq "ASSIGNMENT") { add_line($subelement->[1][1]." = ".print_expression($s +ubelement->[3])); } if ($subelement->[0] eq "COMMENT") { add_line("REM - comment here"); } $level--; }; sub add_line { push @lines, [$current_line, shift]; }; sub pretty_print { my $tree = shift; $ln = shift(@{$tree}); print_statement($tree->[0]); }; while (<>) { chomp($_); $current = $_; $tree = []; generic_node('^(\d+)', $tree) || die "Line number invalid"; $current_line = $tree->[0]; statement($tree) || die "Invalid statement"; print Dumper($tree); $level = 0; pretty_print($tree); }; for (@lines) { print $_->[0]." ".$_->[1]."\n"; }
      Wow. If you put that much work in, you should make yourself an account so you can feel the goodness of peer appreciation. That and you can give clintp a name to hate, so he doesn't have to keep referring to an anonymous monk :)

      ____________________
      Jeremy
      I didn't believe in evil until I dated it.

        Yes well you see the problem is that I was feeling slightly guilty. When I read the article originally, i had no idea what Parrot was, so the term "Parrot Basic" suggested a version of basic called Parrot, not, as it turned out, a version of basic that was supposed to run on an alpha-version virtual machine.

        Thus, you can understand that in my view, the fact that the basic interpreter (which I figured to be in perl) could not handle nested ELSE suggested that its internals were designed incorrectly (since perl is more than capable of recursion with little effort), and therefore it was in the best interests of the questioning monk to re-write his code in a manner more suited to the problem. I have no problem being blunt about this if necessary.

        The issue of course, being that it was not just another basic interpreter, nor was it written in perl. Instead it was written by hand in a register/op-code language that wasn't close to finished. While I looked into it myself and determined that a recursive decent parser is practical on the VM as it stands, it is by no means as trivial as the perl variant.

        Thus, by way of apology, I wrote code that would parse the Basic language the proper way and then do the necessary transformations to output a simplified Basic that would be easier to write Parrot code for until such time as it becomes easy to write a grammar within Parrot.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://175025]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-03-29 14:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found