Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Trying to construct all possible strings from 4 bases [ATCG]

by monkfan (Curate)
on Feb 21, 2005 at 10:31 UTC ( [id://433004]=perlquestion: print w/replies, xml ) Need Help??

monkfan has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I am trying to construct all possible strings from 4 bases (ATCG), there are 4 ^ L possible strings of such.

E . g with L = 2 we have AA, AT, AC, AG, .. GC, GA, GT, GG as many as 4 ^ 2 = 16 strings,
with L = 3 we have as many as 4 ^ 3 = 64 strings

What's wrong with my code below ? Such that it doesnt give what I want ?
Thanks so much before hand .

#!/usr/bin/perl -w use strict; use warnings; use Data::Dumper; my $l = 2; #Motif Length my @cand_motif; my $all_a = 'A' x $l; my $all_t = 'T' x $l; my $all_c = 'C' x $l; my $all_g = 'G' x $l; my @initial_motifs = ( $all_a, $all_t, $all_c, $all_g ); my @enumerated_motifs; my (@nucleotides) = ( 'a', 't', 'c', 'g' ); foreach my $cand_motif (@initial_motifs) { for ( my $i = 0 ; $i < $l ; $i++ ) { foreach my $nucl (@nucleotides) { substr( $cand_motif, $i, 1 ) = $nucl; push @enumerated_motifs, $cand_motif; #print "$cand_motif\n"; } } } my $count = @enumerated_motifs; print "Count = $count\n"; print Dumper \@enumerated_motifs;
Regards,
Edward

Replies are listed 'Best First'.
Re: Trying to construct all possible strings from 4 bases [ATCG]
by Molt (Chaplain) on Feb 21, 2005 at 10:57 UTC

    On the basis that you're more interested in getting the results rather than doing it yourself you may want to use the String::Combination module on CPAN which does what you need.

    The following code seems to do the job..

    #!/usr/bin/perl use strict; use warnings; use String::Combination; # Get all the combinations of bases which have a prescribed length. my $length=2; my @combinations=String::Combination::combination('atcg',$length); # Output the results. print "$_\n" for @combinations;
Re: Trying to construct all possible strings from 4 bases [ATCG]
by Taulmarill (Deacon) on Feb 21, 2005 at 10:52 UTC
    this seemes to be a case for recursive programming. if i understood you correctly, this code should do what you want.
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $l = 2; #Motif Length my @nucs = qw/A T C G/; my @enum = enum( $l, \@nucs ); print Dumper \@enum; sub enum { return @{$_[1]} unless --$_[0]; map { my $nuc = $_; map { $nuc . $_ } @{$_[1]} } enum( $_[0], $_[1 +]); }
      Hi Taumaril,
      Thanks so much for answering me.

      Using recursive along with map is way tooooo.. high for me.
      Would you mind explain it with "for" loop instead?

      As I don't have a strong background algorithmic way of thinking,
      I would like to understand recursion in more basic way.
      I hope this is not too much to ask ;-)

      Regards,
      Edward
        hi,
        it's not too mutch to ask. in fact i'm relay glad to explain some code, 'cause it shows, that you are willing to learn. so here it comes:

        ### my recursive subroutine ########################### sub enum { ### return the list of elements (aka nucleotides) if this is the l +ast ### recursion. ### also note that --$_[0] decrements the counter _before_ it is t +ested ################################################################## +##### return @{$_[1]} unless --$_[0]; ### define a variable which holds my return values ################################################## my @return; ### iterate over the return values of the next recursion ######################################################## for ( enum( $_[0], $_[1] ) ){ my $nuc = $_; ### iterate over all nucleotides ################################ for ( @{$_[1]} ) { push @return, $nuc . $_; } } ### return the list so far to the prior recursion ################################################# return @return; }

        if you have problems to understand how it works, try to think about what happens, when $_[0] = 1 then think about what happens, when $_[0] = 2 and then you should get it.
        if you don't understand this at all, it could be because my english is not even close to perfect. so maybe i just didn't explain it correct, which means "don't hesitate to ask questions" :-)
Re: Trying to construct all possible strings from 4 bases [ATCG]
by Aristotle (Chancellor) on Feb 21, 2005 at 10:54 UTC

    You didn't say in what way the code is behaving differently from your expectation but it's a fairly simple problem to solve correctly; even without recursion.

    #!/usr/bin/perl -w use strict; use warnings; use Data::Dumper; my $length = 2; my @motif = ( '' ); for my $i ( 1 .. $length ) { @motif = map { my $motif = $_; map $motif . $_, qw( a t c g ); } @motif; } print Dumper \@motif;

    Ie we start out with a list with a single empty string and then on each round, we create four new strings for every string we already have.

    (The shard-eyed will notice that this code does the same work as the recursive variant, only explicitly — well, and in different order.)

    Makeshifts last the longest.

Re: Trying to construct all possible strings from 4 bases [ATCG]
by gaal (Parson) on Feb 21, 2005 at 11:04 UTC
    The approach I like for this kind of problem is to treat each string as a base-4 representation of a number. You just need a base-4 convertor (with this specific alphabet), and then your code is just an enumeration from 0 .. 4^L - 1.

    Using Math::BaseCalc, here's a simple solution (untested):

    use Math::BaseCalc; $calc4 = new Math::BaseCalc(digits=>[qw/A T C G/]); print($calc4->to_base($_), "\n") for 0 .. 4 ** $L - 1;

    (You probably need to left-pad the string with 'A's to length L. Here 'A' maps to 0.)

    Update: Good catch, ++jdalbec (fixed off-by-one bug).

Re: Trying to construct all possible strings from 4 bases [ATCG] (use a closure to generate an iterator)
by grinder (Bishop) on Feb 21, 2005 at 11:58 UTC

    Your solution, and the solutions offered so far, all appear to generate an array with the proposed results. This is okay, but will chew up memory the longer the string of bases. *

    What you really want is a lazy iterator that will fetch the next sequence in the solution space. The following does what you want, at a greatly reduced cost in memory

    #! /usr/local/bin/perl -w use strict; my @nuc = qw/ A T C G /; sub inc { $_[0] eq 'A' and return 'T'; $_[0] eq 'T' and return 'C'; return 'G'; } sub gen { my @current = @_; my $done = 0; return sub { return if $done; my @res = @current; my $i; ITER: for ($i = 0; $i < scalar @current; ++$i) { if( $current[$i] ne $nuc[-1] ) { $current[$i] = inc($current[$i]); last; } else { $current[$i] = $nuc[0]; } } $done = 1 if $i >= scalar @res; return @res; } } my @set = ('A') x (shift || 3); my $iter = gen( @set ); while( my @iter = $iter->() ) { print "@iter\n"; }

    The inc kluge to move to the next base is a bit too ugly, there are better ways of doing this, but I leave that as an exercise to the reader.

    If you are interested in learning more about iterators, I draw your attention to Dominus' forthcoming book, Higher Order Perl.


    * I thought gaal's for 0 .. n constructed a list. It used to be the case on old perls, but gaal assures me this hasn't been the case in years. Which just goes to show that you learn something every day.

    - another intruder with the mooring in the heart of the Perl

Re: Trying to construct all possible strings from 4 bases [ATCG]
by brian_d_foy (Abbot) on Feb 21, 2005 at 12:44 UTC

    I wrote the Set::CrossProduct module to do this sort of thing. It doesn't do anything recursive, and you can get back the entire set of tuples all at once or one tuple at a time. Getting them one at a time is really handy when you end up generating thousands of tuples. :)

    --
    brian d foy <bdfoy@cpan.org>
Re: Trying to construct all possible strings from 4 bases [ATCG]
by Chady (Priest) on Feb 21, 2005 at 13:49 UTC

    okay, and here's another way to do it: not so efficient probably, but fun nevertheless

    my $base = join ",", qw/A T C G/; my $L = 4; my $string = "{$base}" x $L; my @combinations = glob $string; print join ":", @combinations;

    He who asks will be a fool for five minutes, but he who doesn't ask will remain a fool for life.
    Chady | http://chady.net/
    Are you a Linux user in Lebanon? join the Lebanese Linux User Group.
Re: Trying to construct all possible strings from 4 bases [ATCG]
by Anonymous Monk on Feb 21, 2005 at 12:29 UTC
    Nothing a good regex can't solve for you. No need for time consuming recursive approaches that blow up the stack. No need for off-by-one nested loops (are you sure you have enough of them?). All way to complicated. A regex can do all.

    my $bases = "ATCG"; my $length = 4; my $q = join ";", ($bases) x $length; my $r = join ";", ("[^;]*([^;])[^;]*") x $length; use re 'eval'; $q =~ /^$r$(?{print "$1$2$3$4\n"})(?!)/;
Re: Trying to construct all possible strings from 4 bases [ATCG] (Algorithm::Loops)
by tye (Sage) on Feb 21, 2005 at 17:14 UTC

    Obligatory Algorithm::Loops version:

    use Algorithm::Loops qw( NestedLoops ); my $minLen= 2; my $maxLen= 4; my $iter= NestedLoops( [ ( [qw( A T C G )] ) x $maxLen ], { OnlyWhen => sub { $minLen <= @_ && @_ <= $maxLen } }, ); my @bases; while( @bases= $iter->() ) { print @bases, $/; }

    - tye        

Re: Trying to construct all possible strings from 4 bases [ATCG]
by jdalbec (Deacon) on Feb 21, 2005 at 15:46 UTC
    What's wrong with my code below ?
    You appear to be trying to use what I think of as a "meta-for loop" to get the effect of $l nested for loops. No programming language I know of implements this directly.
    for ( my $i = 0 ; $i < $l ; $i++ ) { foreach my $nucl (@nucleotides) { substr( $cand_motif, $i, 1 ) = $nucl; push @enumerated_motifs, $cand_motif; #print "$cand_motif\n"; } }
    In a language that implemented the "meta-for" construct you could write something like
    meta_for ( my $i = 0 ; $i < $l ; $i++ ) { foreach my $nucl (@nucleotides) { substr( $cand_motif, $i, 1 ) = $nucl; } # meta_for push @enumerated_motifs, $cand_motif; #print "$cand_motif\n"; meta_end_for ( my $i = 0 ; $i < $l ; $i++ ) { } # foreach } # meta_end_for
    Of course, parsing the above code as written presents serious difficulties. Probably a different syntax would be required.

    Update: Probably the best way to implement this (Update2: i.e. the "meta-for" construct) in Perl would be to use eval.

    my $code = "xyzzy"; for ( my $i = 0 ; $i < $l ; $i++ ) { $code =~ s/xyzzy/ foreach my \$nucl (\@nucleotides) { substr( \$cand_motif, $i, 1 ) = \$nucl; xyzzy } /; } $code =~ s/xyzzy/ push \@enumerated_motifs, \$cand_motif; #print "\$cand_motif\\n"; /; eval $code;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://433004]
Approved by castaway
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2024-04-24 15:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found