note
tilly
Here be [http://www.tuxedo.org/~esr/jargon/html/entry/Dragon-Book.html|dragons]. :-)<P>
The fundamental problem here lies in the differing NFA/DFA
semantics. No, there is no easy way to convert one to the
other.<P>
As [id://21667] explains so well, NFAs are a
brute-force recursive search. If there is a combinatorial
explosion of possible ways to start trying to match pattern
to text (think about <tt>/^(\s*yada\s*)+$/</tt> for a
second) you get a situation where an NFA will hit
exponential failure modes. (No, Perl didn't fail to see
that there was no match, you just didn't let it work for
enough years.) But since an NFA is a brute force search,
just adjusting the search pattern lets you specify which
alternatives come first, and whether this wildcard pattern
is greedy while that one is not. Plus things like
backreferences are easy to implement.<P>
By contrast DFAs walk the string once, keeping track of
all of the places they might be in the pattern at once.
This is guaranteed to finish quickly, but good luck if you
want to keep track of things like backreferences. (Well
actually there are quadratic algorithms that will capture
backreferences, but you cannot use them within the pattern.)
By an implementation quirk, a DFA will always find the
longest overall match. (They can also be designed the
find the shortest, the Unix grep utility does this for
speed.)<P>
So NFAs and DFAs find different matches by default. NFAs
offer more features and finer control. DFAs better
guaranteed performance. If you understand all of that
then you should be able to follow
[http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2000-05/msg00909.html|this]
explanation of the dangers that lie in wait for someone
attempting a native conversion of flex to using Perl's RE
engine.<P>
Now in addition to the other ideas you have had (most of
which will be workable, but will involve some learning)
if you don't mind trying something which is possibly
insane you could investigate whether [cpan://Inline] makes
it feasible to use the Flex specification directly in
Perl...<P>
<B>UPDATE</B><BR>
[danger] pointed out that I had said that a DFA was a brute
force search when I clearly meant that an NFA was. Fixed.
56225
56225