Don't ask to ask, just ask | |
PerlMonks |
Re (tilly) 1: Donuts and DFA'sby tilly (Archbishop) |
on Feb 20, 2001 at 23:26 UTC ( [id://59741]=note: print w/replies, xml ) | Need Help?? |
Sort of, only not so complicated. :-) An NFA is already compiled internally into a series of states with transitions between them. What I am proposing to do is take that representation and rewrite part of it using new states that are ordered subsets of the original. There are two tricks here. The more obvious one is summed up by the phrase, "Put off figuring out why you have done this work as long as possible." That is, if you are going to have to backtrack and then go forward with identical logic, why not make it so that you go forward following common logic as long as possible? This is next character discrimination, for instance an RE to match all of the states would be faster because if you saw an A first you would have already figured out that you just need to look at states whose name starts with A. The more subtle and important one is that if you are aggressive about it, you can eliminate a lot of repeated following of the same reasoning. Basically if ever you arrive at the same place in the string, with the same state information relevant to your match, at the same position in your RE, then the outcome is going to be the same. With a traditional NFA you might arrive at a state, go forward, fail, backtrack. Then you follow a different line of reasoning forward, reach the same place, and of course you will do the same work to figure out that you fail again. This is all repeated useless work, you already did this stuff. Alternately if it succeeds the first time, it will not backtrack and again it will be irrelevant that there is another way to the same state. Either way, the *second* way to the same combination is irrelevant. So in the process of optimizing if you see that you can reach the same state in multiple lines of reasoning can safely drop the second way of reaching it. By explicitly parallelizing the reasoning process that the NFA will use you can cause it to realize that 2 chains of reasoning will come back to the same state, and so the ordered sets of states can omit duplicates. And that is why you can avoid the combinatorial backtracking disaster! OTOH there is no requirement to optimize the whole RE in this way. In fact when you are capturing a backreference used elsewhere in the RE, then state information relevant to the global match is being captured and you explicitly cannot apply this optimization. You might still optimize bits and pieces, but you won't be able to find a simple and efficient way to match crazy things like /(.*).*\1/...
In Section
Seekers of Perl Wisdom
|
|