Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Growing strings in search

by AnomalousMonk (Archbishop)
on Apr 15, 2020 at 19:27 UTC ( [id://11115588]=note: print w/replies, xml ) Need Help??


in reply to Growing strings in search

If one could run the match engine "backwards" and match a string in reverse, i.e., from end to beginning, that might solve your problem. Unfortunately, there's no  m//r modifier that I'm aware of.

One might do a match against a reversed string:

c:\@Work\Perl\monks>perl -wMstrict -le "my $s = ''; my $r = ''; my @add = qw(a b c X d e Y f g h); ;; my $xeger = qr{ \A Y .* X }xms; ;; while (@add){ print qq{'$s' }, $r =~ $xeger ? 'MATCH' : \"no match\"; $r = qq{$add[0]$r}; $s .= shift @add; } " '' no match 'a' no match 'ab' no match 'abc' no match 'abcX' no match 'abcXd' no match 'abcXde' no match 'abcXdeY' MATCH 'abcXdeYf' no match 'abcXdeYfg' no match
Incrementally building and maintaining a reversed string for each "forward" string might not be too expensive even for hundreds of strings of thousands of characters. Unfortunately, you now have the problem of finding some way to persuade the "software" to generate an arbitrary regex backwards, so that, e.g.,  X .* Y becomes  Y .* X (but that's just a small matter of programming, right? :).

BTW: Is it possible for you to suppress re-compilation of the regex on every match, perhaps with the  m//o modifier? That might speed up matching in general and at least alleviate your problem.


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^2: Growing strings in search
by belg4mit (Prior) on Apr 20, 2020 at 13:33 UTC
    One precompiles an expression using qr// to avoid repeated compilations later.
    my $RE = qr/foo.+?bar/; foreach (@str) { warn "Huzzah! $str" if m/$RE/; }

    --
    In Bob We Trust, All Others Bring Data.

      What I had in mind with my comment about the  m//o modifier is that b4swine might be receiving a complex regex and then interpolating it a la  m/...$re.../ during actual matching (as, indeed, you show in your example). This might be done if some additional match condition(s) were to be imposed on every received regex, e.g., adding a  \z anchor at the end to force end-of-string matching. Of course, I don't know that anything like this is actually happening; it was just a thought.

      For regexes interpolated into  m// matches, use of the  /o modifier seems beneficial in some cases, apparently depending on Perl version. (I know that work is being done continuously to refine and optimize regex matching.) For instance, under two older Perl versions that I have access to ATM (ActiveState 5.8.9 and Strawberry 5.14.4.1), benchmarking shows a significant benefit in the newer Perl for using  /o when interpolating into an  m// match (assuming this is a valid benchmark, and benchmarks can be tricky :).

      use strict; use warnings; use Benchmark qw(cmpthese); print "perl version $] \n"; my $rx = qr{X}xms; cmpthese(-1, { 'qr_bound' => sub { 'Y' =~ $rx ; }, 'qr_interp' => sub { 'Y' =~ / $rx / ; }, 'qr_interp_o' => sub { 'Y' =~ / $rx /o; }, 'm_empty' => sub { 'Y' =~ //; }, 'm_empty_o' => sub { 'Y' =~ //o; }, });
      Output:
      c:\@Work\Perl\monks\belg4mit>perl cmp_qr_compilation_1.pl perl version 5.008009 Rate qr_interp qr_interp_o qr_bound m_empty + m_empty_o qr_interp 2880477/s -- -9% -43% -55% + -67% qr_interp_o 3181791/s 10% -- -37% -51% + -64% qr_bound 5078627/s 76% 60% -- -21% + -42% m_empty 6452035/s 124% 103% 27% -- + -26% m_empty_o 8775008/s 205% 176% 73% 36% + -- c:\@Work\Perl\monks\belg4mit>perl cmp_qr_compilation_1.pl perl version 5.014004 Rate qr_bound qr_interp m_empty m_empty_o q +r_interp_o qr_bound 1004609/s -- -70% -76% -80% + -92% qr_interp 3378700/s 236% -- -20% -33% + -74% m_empty 4237856/s 322% 25% -- -16% + -68% m_empty_o 5068698/s 405% 50% 20% -- + -61% qr_interp_o 13135439/s 1208% 289% 210% 159% + --
      The  qr// comparisons are between failing matches because that's what I imagine would happen most often in the actual application. I just threw in some successful  // null matches out of curiosity.


      Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11115588]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2024-04-19 20:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found