Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Flex, converting regexes, and other Interesting Stuff.

by Crulx (Monk)
on Feb 04, 2001 at 16:15 UTC ( #56306=note: print w/replies, xml ) Need Help??


in reply to Flex, converting regexes, and other Interesting Stuff.

To the best of my knowledge...
The problem is not just that Perl uses a NFA algorithm for its regular expressions and Flex uses a DFA one. It is that flex maintains considerable special state information and processes all of the rules in parallel. Consider the following flex code....
%x comment %% int line_num = 1; "/*" BEGIN(comment); <comment>[^*\n]* /* eat anything that's not a '*' */ <comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */ <comment>\n ++line_num; <comment>"*"+"/" BEGIN(INITIAL);
Now this will eat anything in a comment... while maintaing that line count var.... But implied in that code is
  1. A system for maintaining which runstate we are in. That is what remembers that we are in the Comment state.
  2. A system for matching the greediest RE in the list. That is what keeps
    <comment>[^*\n]*
    from matching the * in */. The existance of that "*" + "/" rule.
Of course nothing is stopping you from writing a program that does all of those things in perl. But you cannot just use the regular expressions cart blanch. There are slight differences in perl's re and flex's use of yacc's re system. Ie
"/*" BEGIN(comment);
is perl's
if ($bob =~ /^\/\*/) { $comment = 1; }
But again.. you could write something that properly escaped these things.

Here is some perl code to do that flex code.

#!/usr/bin/perl use strict; my $bob = "/* THis ** comment** is\n\nMy\ngreatest\n Comment** */"; my $comment = 0; my $line_num = 1; while($bob) { my @match_list; print "STARTLOOP matching $bob\n"; if ($bob =~ /^\/\*/) { push @match_list,{rest=>$',code=>"\$comment=1",matched=>$&}; print "start comment matched: $&\n"; } if($comment && $bob =~ /^[^*\n]*/) { push @match_list,{rest=>$',code=>"",matched=>$&}; print "not star matched: $&\n"; } if($comment && $bob =~ /^\*[^*\n]*/) { push @match_list,{rest=>$',code=>"",matched=>$&}; print "star with no slash matched: $&\n"; } if($comment && $bob =~ /^\n/) { push @match_list,{rest=>$',code=>"++\$line_num;",matched=>$&}; print "newline matched: $&\n"; } if($comment && $bob =~ /^\*\//) { push @match_list,{rest=>$',code=>"\$comment=0",matched=>$&}; print "end comment matched: $&\n"; } my $max_length = 0; my $match; #should be priority queue.. but I'm too tired to # use that module now... foreach $match (@match_list) { if(length($match->{matched}) > $max_length) { $max_length = length($match->{matched}); } } foreach $match (@match_list) { if(length($match->{matched}) == $max_length) { eval $match->{code}; $bob = $match->{rest}; } } } print "line num is $line_num\n";
You should be able to see how you could make a program that took the flex code and made perl from it. It could just leave stub functions or "" in the code key of that hash.

DFA's are just NFA's that just don't allow multiple moves from one state to another via the same character. It is hard to get an NFA in DFA form, but the converse is not true.

Hope this makes sense.
---
Crulx
crulx@iaxs.net

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://56306]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2022-10-02 07:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My preferred way to holiday/vacation is:











    Results (8 votes). Check out past polls.

    Notices?