Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Matching permutations with regex

by QM (Parson)
on Nov 20, 2018 at 11:29 UTC ( #1226058=perlquestion: print w/replies, xml ) Need Help??

QM has asked for the wisdom of the Perl Monks concerning the following question:

Today I learned that a regex can match permutations of a given string.

Update: Oops, I had the terms reversed on the regex, it should be $i =~ $x .... I've corrected it, but it doesn't really change the results.

For single characters without repeats, such as any permutation of 'abc', this seems the shortest, most readable method:

@x = glob('{a,b,c}'x3); # for testing: all permutations, plus many mor +e $x = '(?:([a-c])(?!.*\1)){3}'; # regex for $i (@x) { $i =~ $x and say $i; # was $x =~ $i }

which outputs:

abc acb bac bca cab cba

For single character, with limited repeats, such as any permutation of 'aabbcc', a slightly different approach is needed. I tried the following, but it matches almost everything. I suspect there's some unexpected behavior in the backtracking?

@x = glob('{a,b,c}'x6); $x = '(?:a()|a()|b()|b()|c()|c()){6}\1\2\3\4\5\6'; # limited repeats a +llowed for $i (@x) { $i =~ $x and say $i; # was $x =~ $i }

which outputs 540 of the 728 strings given (at least 1 of each char is present):

aaaabc aaaacb aaabac aaabbc aaabca ... cccbba cccbca ccccab ccccba

Is there some other magic to DWIM?

For nonrepeating, multichar permutations, even when individual chars are shared between tokens, this approach works:

@x = glob('{abc,bcd,cde}'x3); # permutations of 'abc/bcd/cde' $x = '(?:abc()|bcd()|cde()){3}\1\2\3'; for $i (@x) { $i =~ $x and say $i; # was $x =~ $i }

which outputs:

abcbcdcde abccdebcd bcdabccde bcdcdeabc cdeabcbcd cdebcdabc

I found this here on Stack Overflow.

-QM
--
Quantum Mechanics: The dreams stuff is made of

-QM
--
Quantum Mechanics: The dreams stuff is made of

Replies are listed 'Best First'.
Re: Matching permutations with regex
by Eily (Monsignor) on Nov 20, 2018 at 14:01 UTC

    Wow, I didn't know the empty capture and backreference trick, nice!

    I also didn't know the trick where you put the pattern on the left side of the =~ operator and still get results somehow ;-)

    I suspect there's some unexpected behavior in the backtracking?
    I tried:
    use v5.20; use Data::Dump "pp"; my @z = glob('{a,b,c}'x6); my $z = '(?:a()|a()|b()|b()|c()|c()){6}\1\2\3\4\5\6'; for my $j (@z) { $j =~ $z and say pp {$j => \@- }; }
    And I got a bunch of values likes:
    ... { acbcaa => [0, 6, 6, 3, 3, 4, 4] } { acbcab => [0, 5, 5, 6, 6, 4, 4] } { acbcac => [0, 5, 5, 3, 3, 6, 6] } { acbcba => [0, 6, 6, 5, 5, 4, 4] } { acbcca => [0, 6, 6, 3, 3, 5, 5] } { accaab => [0, 5, 5, 6, 6, 3, 3] } { accaba => [0, 6, 6, 5, 5, 3, 3] } ... { cccbab => [0, 5, 5, 6, 6, 3, 3] } { cccbac => [0, 5, 5, 4, 4, 6, 6] } { cccbba => [0, 6, 6, 5, 5, 3, 3] } { cccbca => [0, 6, 6, 4, 4, 5, 5] } { ccccab => [0, 5, 5, 6, 6, 4, 4] } { ccccba => [0, 6, 6, 5, 5, 4, 4] }
    Where each pair of alternative match exactly (eg \1 and \2) at the same place, no matter what. I'd suspect that the identical branches are actually merged by the optimizer.
    Is there some other magic to DWIM?
    There's this:
    my @y = glob('{a,b,c}'x6); my $y = '(?:(?!\1)a()|(?!\2)a()|(?!\3)b()|(?!\4)b()|(?!\5)c()|(?!\6)c( +)){6}\1\2\3\4\5\6'; for my $j (@y) { $j =~ $y and say $j; }
    aabbcc aabcbc aabccb aacbbc aacbcb aaccbb ababcc abacbc abaccb ... ccbaab ccbaba ccbbaa

    Edit: this also works actually (without \1\2\3\4\5\6 at the end):

    # edit reformatted as a multiline regex for clarity my $y = qr/(?: (?!\1) a () | (?!\2) a () | (?!\3) b () | (?!\4) b () | (?!\5) c () | (?!\6) c () ){6} /x;
    So TIL, (?!\x)XXX() is a pattern to only allow XXX to match once in the whole regex... Cool :-)

      I also didn't know the trick where you put the pattern on the left side of the =~ operator and still get results somehow ;-
      Ugh. Major brain fart. And it still works!

      The (?!\1)xxx(() element is great, thanks.

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

Re: Matching permutations with regex
by tybalt89 (Prior) on Nov 20, 2018 at 12:51 UTC
    #!/usr/bin/perl -l # https://en.wikipedia.org/wiki/Permutation#Generation_in_lexicographi +c_order use strict; use warnings; $_ = 'aabbcc'; 1 while print, s/.*\K # find the last (.) # char such that (.*) # there is a later (latest) (.)(??{$1 ge $3 and '(*F)'}) # char greater than it (.*) # and get rest # swap those two chars ( $1 & $3 ) # then reverse everything after the first swapped char / $3 . reverse $2.$1.$4 /xe
      OK. That's probably pretty fast too.

      But I think you misunderstood? I can generate them, but I wanted to _match_ any permutation of a given string.

      Update: Oh, I had the =~ backwards, and that led you astray.

      (Not shown in the OP is a programmatic way to generate the regex solution based on the input string, but that was left out for clarity.)

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

        Like so ?

        #!/usr/bin/perl -l # https://perlmonks.org/?node_id=1226058 use strict; use warnings; $_ = 'aabbcc'; my $len = length; my $inner = join ',', split //, tr///csr; my $glob = "{$inner}" x $len; my $re = "^(?=.{$len}\$)"; while( /./ ) { my $count = s/$&//g; $re .= "(?=(?:.*$&){$count})"; } $_ =~ $re and print for glob $glob;

        Outputs the 90 valid permutations.

Re: Matching permutations with regex
by tybalt89 (Prior) on Nov 20, 2018 at 15:22 UTC
    #!/usr/bin/perl -l # https://perlmonks.org/?node_id=1226058 use strict; use warnings; $_ = 'abc'; /(.).*\1/ or print for glob "{@{[ s/\B/,/gr ]}}" x length;
Re: Matching permutations with regex
by rsFalse (Hermit) on Nov 21, 2018 at 14:10 UTC
    If @x can become huge (by generating all combinations with glob), another variant is - to try to match against multicated string, joined (or prepended) by some separator, which "blocks" backtracking to previous strings. In this case, all combinations will be tested, unless to use FAIL somewhere in a middle of search. In my opinion, this approach is not truly 'regex-like', because here a simple hash for counting occurrences is used.
    #!/usr/bin/perl # https://www.perlmonks.org/?node_id=1226058 use warnings; use re 'eval'; $\ = $/; for my $tc ( [ 'abc', 1 ], [ 'abc', 2 ], [ 'abc', 3 ], ){ my( $string, $times ) = @{ $tc }; $_ = ( ',' . reverse $string ) x ( $times * length $string ); print; my @permutations; my %occ; my $re = ', \w*(\w)\w* (?: (?{ $occ{ $^N } ++ }) | (?{ $occ{ $^N } -- }) (*F) ) ' x ( $times * length $string ); ; / $re $ (??{ ( grep $_ != $times, values %occ ) ? '(*F)' : '' }) (?{ push @permutations, join '', grep length, $1, $2, $3, $4, $5, $6, $7, $8, $9, }) (*F) /x; print for @permutations, ~~ @permutations ; }
    Almost the same code, except that I don't use '$1, $2, $3, $4, ...', but rather use a variable for saving current permutation string: And if I won't to check if my current generating permutation is bad, I include that test '(??{ ( grep $_ > $times, values %occ ) ? "(*F)" : "" })' inside '$re'. Maybe in some cases it can save searching time:
    my $re = ', \w*(\w)\w* (?: (?{ $occ{ $^N } ++ }) | (?{ $occ{ $^N } -- }) (*F) ) (??{ ( grep $_ > $times, values %occ ) ? "(*F)" : "" }) ' x ( $times * length $string ); ;
      I had the regex terms backwards. And the glob was just for testing purposes -- I wasn't trying to generate permutations, but to match one permutation of a known string.

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1226058]
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2021-01-21 18:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?