http://qs321.pair.com?node_id=1108196


in reply to Regular expression for a comma separated string

Hi,

Hopefully this does the trick for you, using negative lookahead to make sure each character doesn't repeat before the comma (or end of string):

my @tst = qw( A,G AG,CT TC,CA GAT,CGA CGAT,TG ,G ACGT X,A AA,G AC,GGC ATGA,TGG ATCXG,AAC ); for (@tst) { my $side = qr/(?:([ACGT])(?![^,]*\g{-1}))+/; print $_ . (/^$side,$side$/ ? ' good' : ' bad') . $/; }
Prints:
A,G good AG,CT good TC,CA good GAT,CGA good CGAT,TG good ,G bad ACGT bad X,A bad AA,G bad AC,GGC bad ATGA,TGG bad ATCXG,AAC bad

Oops, more test coverage showed an issue. Needed to use relative group instead of "\1" as originally posted which had a problem with "AC,AC". Fixed.

Replies are listed 'Best First'.
Re^2: Regular expression for a comma separated string
by naderra (Novice) on Nov 25, 2014 at 14:25 UTC

    Loops, thank you kindly for your solution.

    For testing, the following code generated all, up to 4 in length, possible permutations with repetition of ACGT; the inclusion of incorrect characters can be added later by hand.

    test program:

    and then:

    $ ./generate_ACGT_01.pl > y $ cat y | ./test_ACGT_01.pl > x $ less x

    Robert