http://qs321.pair.com?node_id=653869


in reply to How to expand a string

This sounds like homework. People will be glad to help you when you show that you have thought a little about this problem.

Replies are listed 'Best First'.
Re^2: How to expand a string
by Anonymous Monk on Nov 29, 2007 at 15:15 UTC
    really it's not - I'm a postdoc trying to write a quick program to analyse DNA sequences! There's a heck of a lot of them to do and this was just a simple example - in some cases in DNA, 3 bases can be represented by 1 letter so it's not as simple as it first looks (at least to me!)
      use strict; use warnings; use Data::Dumper; my %replace = ( R => '{A,G}', S => '{C,G}', K => '{G,T}', ' ' => '_', ); my $s = 'KAG GTR CAG CTG AAG SAG TCA GG'; my @results; for my $base (keys %replace) { $s =~ s/$base/$replace{$base}/g; } push @results, $_ while glob $s; @results = map { s/_/ /g; $_ } @results;
      Update: BrowserUK's solution. Credits to him.
        Genius!! Anything I'd have written would have been considerably longer! Thanks very much.
      This isn't a complete solution, but I think it will get you started on the right path. If you need more detailed help, I'll need a more detailed description of the problem (namely, expected inputs and outputs). What you want to do is iterate over the string and replace the current character with possible replacement characters. You could use recursion as mentioned above, but that's probably overkill for what you're looking for.
      use strict; #the substitution possibilities my @a = ('a'); my @b = ('d','e'); my @c = ('f','g','h'); my $string = pop(@_); #this is an argument passed to the script on th +e command line #my $string = "abc"; #if you want it hardcoded my $expanded_string = ""; my @strings = (); foreach my $first (@a){ foreach my $second (@b){ foreach my $third (@c){ $string = "$first$second$third"; push(@strings, $string); #to store them all print "$string\n"; #to print this particular one } } }
        Thanks, all the replies are helping me get there but it's still a bit of a nightmare to code. Here's some example inputs etc.

        R = A or G
        S = C or G
        K = G or T
        All the other letters stay constant.

        my $seq1 = "CAG GTR CAG CTG AAG SAG TCA GG";
        my $seq2 = "GAK GTG CAG CTT CAG CAG TCR GG";

        The gaps between sets of 3 letters aren't important - it just signifies DNA codons.

        So, both seq 1 and seq 2 have 4 possible resulting sequences. I need to store the 4 seqs associated with seq 1 separately e.g. in a different array to those of seq 2.

        If you have any ideas about the best way to do this I'd be very grateful! Speed isn't a big consideration, as long as it works!

        Thanks for your help.