http://qs321.pair.com?node_id=11115586


in reply to Growing strings in search

b4swine:

You can't do it just using the perl regex engine. You might be able to turn your regex into a state machine, though, and then you could your strings through it character by character to find which one(s) match and which one(s) fail.

Building a state machine from a regex isn't *terribly* difficult, but there are a couple sticking points. The main problem is that you'll have a tradeoff of speed vs. memory consumption. For example, if your regex has branches and multiple capture groups in it, then you may have a *lot* of bookkeeping to track possible starts/stops of capture groups. You may be able to considerably simplify things by making your state machine simpler so it can recognize when a match happens but without tracking your capture groups, and then you can turn the normal perl regex engine loose on the full string to get your capture groups or do a more refined match.

I got distracted by your question and put together a simplified demo of the technique for fun. I'm still tuning it a little bit and I'll post it as a secondary reply later when I'm happy enough with it. If you post the regex you're using, I can tweak my demo accordingly.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Replies are listed 'Best First'.
Re^2: Growing strings in search
by NERDVANA (Beadle) on Apr 16, 2020 at 02:01 UTC
    A particularly good tool for this is Ragel http://www.colm.net/open-source/ragel/

    It compiles character-fed state machines and is a more modern and flexible design than lex or flex.

Re^2: Growing strings in search
by tybalt89 (Prior) on Apr 15, 2020 at 21:28 UTC

    Sounds like you are describing "flex" or "lex". That makes this problem simple, just run a "flex" process against each individual string doing incremental reads -> problem solved :)

      tybalt89:

      Yes, it's very *much* like that.

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

      I've never done anything significant with f?lex and it's been a long, long time since I've even read anything about its/their capabilities, but don't they have the same inherent "time-forward", if you will, orientation that regex compilers have?

      IOW, if you had a flex process that parsed backwards, it would be easy to use, but isn't the big trick to design such a process in the first place? Can you expand on the capabilities of flex in this regard?


      Give a man a fish:  <%-{-{-{-<