Hello,
I want solve this problem using regexes:
Given a "sorted" target string and a sequence of sizes of groups of equal consecutive characters, find and output characters from a string which corresponds to each group.
I wrote a code, which glues smaller regexes to a bigger one, and uses re-eval (
(?{ ... })). I use a "stack" array for saving captured characters, and use push/pop to manipulate it.
Also I used $^N variable interpolating it inside double-quotes. This fails to run without "use re 'eval'". How can I overcome it? Any alternatives to $^N for access a last captured group outside of regex?
Any ideas of alternative solutions for a given problem?
Code:
#!/usr/bin/perl
use warnings;
use strict;
use re 'eval';
$\ = $/;
while(<DATA>){
print '-' x 15;
chomp;
my $target = $_;
my @groups = split ' ', <DATA>;
print "target:[$target]";
print "groups:[@groups]";
my @chars_seq;
my $re = join " (?!\\g{-1}).*? \n",
map { sprintf
"(?: " .
"( (.)\\g{-1}{%d} ) " .
"(?{ push \@chars_seq, \$\^N; }) " .
"(?: (?=) | (?{ pop \@chars_seq }) (*FAIL) ) " .
"(?{ print join ' ', \@chars_seq; })" .
")"
, $_ - 1
} @groups;
print for "regex:[",
$re,
"]";
$re =~ s/\n//g;
$target =~ /
$re
/x or print "FAIL!";
print "character sequence:[@chars_seq]";
}
__DATA__
cbaa
1 2
cba
1
bbaa
1 2
cccqrrtaaa
2 2 2
cccqrrtaaa
1 1 2 2
cccqrrtaaa
1 1 1 1 3
OUTPUT:
---------------
target:[cbaa]
groups:[1 2]
regex:[
(?: ( (.)\g{-1}{0} ) (?{ push @chars_seq, $^N; }) (?: (?=) | (?{ pop @
+chars_seq }) (*FAIL) ) (?{ print join ' ', @chars_seq; })) (?!\g{-1})
+.*?
(?: ( (.)\g{-1}{1} ) (?{ push @chars_seq, $^N; }) (?: (?=) | (?{ pop @
+chars_seq }) (*FAIL) ) (?{ print join ' ', @chars_seq; }))
]
c
c aa
character sequence:[c aa]
---------------
target:[cba]
groups:[1]
regex:[
(?: ( (.)\g{-1}{0} ) (?{ push @chars_seq, $^N; }) (?: (?=) | (?{ pop @
+chars_seq }) (*FAIL) ) (?{ print join ' ', @chars_seq; }))
]
c
character sequence:[c]
---------------
target:[bbaa]
groups:[1 2]
regex:[
(?: ( (.)\g{-1}{0} ) (?{ push @chars_seq, $^N; }) (?: (?=) | (?{ pop @
+chars_seq }) (*FAIL) ) (?{ print join ' ', @chars_seq; })) (?!\g{-1})
+.*?
(?: ( (.)\g{-1}{1} ) (?{ push @chars_seq, $^N; }) (?: (?=) | (?{ pop @
+chars_seq }) (*FAIL) ) (?{ print join ' ', @chars_seq; }))
]
b
b
b aa
character sequence:[b aa]
---------------
target:[cccqrrtaaa]
groups:[2 2 2]
regex:[
(?: ( (.)\g{-1}{1} ) (?{ push @chars_seq, $^N; }) (?: (?=) | (?{ pop @
+chars_seq }) (*FAIL) ) (?{ print join ' ', @chars_seq; })) (?!\g{-1})
+.*?
(?: ( (.)\g{-1}{1} ) (?{ push @chars_seq, $^N; }) (?: (?=) | (?{ pop @
+chars_seq }) (*FAIL) ) (?{ print join ' ', @chars_seq; })) (?!\g{-1})
+.*?
(?: ( (.)\g{-1}{1} ) (?{ push @chars_seq, $^N; }) (?: (?=) | (?{ pop @
+chars_seq }) (*FAIL) ) (?{ print join ' ', @chars_seq; }))
]
cc
cc
cc rr
cc rr aa
character sequence:[cc rr aa]
---------------
target:[cccqrrtaaa]
groups:[1 1 2 2]
regex:[
(?: ( (.)\g{-1}{0} ) (?{ push @chars_seq, $^N; }) (?: (?=) | (?{ pop @
+chars_seq }) (*FAIL) ) (?{ print join ' ', @chars_seq; })) (?!\g{-1})
+.*?
(?: ( (.)\g{-1}{0} ) (?{ push @chars_seq, $^N; }) (?: (?=) | (?{ pop @
+chars_seq }) (*FAIL) ) (?{ print join ' ', @chars_seq; })) (?!\g{-1})
+.*?
(?: ( (.)\g{-1}{1} ) (?{ push @chars_seq, $^N; }) (?: (?=) | (?{ pop @
+chars_seq }) (*FAIL) ) (?{ print join ' ', @chars_seq; })) (?!\g{-1})
+.*?
(?: ( (.)\g{-1}{1} ) (?{ push @chars_seq, $^N; }) (?: (?=) | (?{ pop @
+chars_seq }) (*FAIL) ) (?{ print join ' ', @chars_seq; }))
]
c
c
c
c q
c q rr
c q rr aa
character sequence:[c q rr aa]
---------------
target:[cccqrrtaaa]
groups:[1 1 1 1 3]
regex:[
(?: ( (.)\g{-1}{0} ) (?{ push @chars_seq, $^N; }) (?: (?=) | (?{ pop @
+chars_seq }) (*FAIL) ) (?{ print join ' ', @chars_seq; })) (?!\g{-1})
+.*?
(?: ( (.)\g{-1}{0} ) (?{ push @chars_seq, $^N; }) (?: (?=) | (?{ pop @
+chars_seq }) (*FAIL) ) (?{ print join ' ', @chars_seq; })) (?!\g{-1})
+.*?
(?: ( (.)\g{-1}{0} ) (?{ push @chars_seq, $^N; }) (?: (?=) | (?{ pop @
+chars_seq }) (*FAIL) ) (?{ print join ' ', @chars_seq; })) (?!\g{-1})
+.*?
(?: ( (.)\g{-1}{0} ) (?{ push @chars_seq, $^N; }) (?: (?=) | (?{ pop @
+chars_seq }) (*FAIL) ) (?{ print join ' ', @chars_seq; })) (?!\g{-1})
+.*?
(?: ( (.)\g{-1}{2} ) (?{ push @chars_seq, $^N; }) (?: (?=) | (?{ pop @
+chars_seq }) (*FAIL) ) (?{ print join ' ', @chars_seq; }))
]
c
c
c
c q
c q r
c q r
c q r t
c q r t aaa
character sequence:[c q r t aaa]
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.