Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Sort of, only not so complicated. :-)

An NFA is already compiled internally into a series of states with transitions between them. What I am proposing to do is take that representation and rewrite part of it using new states that are ordered subsets of the original.

There are two tricks here.

The more obvious one is summed up by the phrase, "Put off figuring out why you have done this work as long as possible." That is, if you are going to have to backtrack and then go forward with identical logic, why not make it so that you go forward following common logic as long as possible? This is next character discrimination, for instance an RE to match all of the states would be faster because if you saw an A first you would have already figured out that you just need to look at states whose name starts with A.

The more subtle and important one is that if you are aggressive about it, you can eliminate a lot of repeated following of the same reasoning. Basically if ever you arrive at the same place in the string, with the same state information relevant to your match, at the same position in your RE, then the outcome is going to be the same. With a traditional NFA you might arrive at a state, go forward, fail, backtrack. Then you follow a different line of reasoning forward, reach the same place, and of course you will do the same work to figure out that you fail again. This is all repeated useless work, you already did this stuff.

Alternately if it succeeds the first time, it will not backtrack and again it will be irrelevant that there is another way to the same state.

Either way, the *second* way to the same combination is irrelevant. So in the process of optimizing if you see that you can reach the same state in multiple lines of reasoning can safely drop the second way of reaching it. By explicitly parallelizing the reasoning process that the NFA will use you can cause it to realize that 2 chains of reasoning will come back to the same state, and so the ordered sets of states can omit duplicates. And that is why you can avoid the combinatorial backtracking disaster!

OTOH there is no requirement to optimize the whole RE in this way. In fact when you are capturing a backreference used elsewhere in the RE, then state information relevant to the global match is being captured and you explicitly cannot apply this optimization. You might still optimize bits and pieces, but you won't be able to find a simple and efficient way to match crazy things like /(.*).*\1/...


In reply to Re (tilly) 1: Donuts and DFA's by tilly
in thread Research ideas by Wodin

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (2)
As of 2024-04-26 00:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found