Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: mysteries of regex substring matching (updated)

by AnomalousMonk (Archbishop)
on Jan 15, 2021 at 23:14 UTC ( [id://11126983]=note: print w/replies, xml ) Need Help??


in reply to mysteries of regex substring matching

I think unpack is better for this as haukex has suggested, but here's a pure-regex solution:

Win8 Strawberry 5.8.9.5 (32) Fri 01/15/2021 18:04:55 C:\@Work\Perl\monks >perl -Mstrict -Mwarnings -MData::Dump=dd my $s = q(AAAAbbbbCCCCddddXeeeeeeeeeeX); my @caps = $s =~ m{ (?<! \A .{16}) \G .{4} | .* }xmsg; dd \@caps; ^Z ["AAAA", "bbbb", "CCCC", "dddd", "XeeeeeeeeeeX", ""]
The trick is to have an unambiguous look-around anchor.

Update: Another variation:

Win8 Strawberry 5.8.9.5 (32) Fri 01/15/2021 18:19:20 C:\@Work\Perl\monks >perl -Mstrict -Mwarnings -MData::Dump=dd my $s = q(AAAAbbbbCCCCddddXeeeeeeeeeeX); my $n = 4; my $m = 3; my @caps = $s =~ m{ (?<! \A (?: .{$n}){$m}) \G .{$n} | .* }xmsg; dd \@caps; ^Z ["AAAA", "bbbb", "CCCC", "ddddXeeeeeeeeeeX", ""]


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^2: mysteries of regex substring matching (explanation)
by LanX (Saint) on Jan 16, 2021 at 11:43 UTC
    Hi AnomalousMonk,

    I'm sure we discussed this technique before, but I can't find it in the archives.

    Do you remember a thread? :)

    update

    well I deciphered it in the meantime, it's not operating on match-groups but the /x /g ° modifier.

    That'll repeat a search where the last match ended, and return all results in list context till it fails

    DB<139> p 0123456789abcdefghijklmnopqrstuvwxyz DB<139> x m{ .... }xg 0 0123 1 4567 2 '89ab' 3 'cdef' 4 'ghij' 5 'klmn' 6 'opqr' 7 'stuv' 8 'wxyz' DB<140>

    with a negative look-behind (?<! ) we can filter out all matches after the initial 4

    DB<140> x m{ (?<! (?: .... ){4} ) .... }xg 0 0123 1 4567 2 '89ab' 3 'cdef' DB<141>

    an or condition | helps matching after the fail.

    DB<142> x m{ (?<! (?: .... ){4} ) .... | .+ }xg 0 0123 1 4567 2 '89ab' 3 'cdef' 4 'ghijklmnopqrstuvwxyz' DB<143>

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    °) thanks AnomalousMonk++ for spotting :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11126983]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (8)
As of 2024-04-18 16:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found