monkfan has asked for the wisdom of the Perl Monks concerning the following question:
Dear Monks,
I am trying to construct all possible strings from 4 bases (ATCG), there are 4 ^ L possible strings of such.
E . g with L = 2 we have AA, AT, AC, AG, .. GC, GA, GT, GG as many as 4 ^ 2 = 16 strings,
with L = 3 we have as many as 4 ^ 3 = 64 strings
What's wrong with my code below ? Such that it doesnt give what I want ?
Thanks so much before hand .
#!/usr/bin/perl -w
use strict;
use warnings;
use Data::Dumper;
my $l = 2; #Motif Length
my @cand_motif;
my $all_a = 'A' x $l;
my $all_t = 'T' x $l;
my $all_c = 'C' x $l;
my $all_g = 'G' x $l;
my @initial_motifs = ( $all_a, $all_t, $all_c, $all_g );
my @enumerated_motifs;
my (@nucleotides) = ( 'a', 't', 'c', 'g' );
foreach my $cand_motif (@initial_motifs) {
for ( my $i = 0 ; $i < $l ; $i++ ) {
foreach my $nucl (@nucleotides) {
substr( $cand_motif, $i, 1 ) = $nucl;
push @enumerated_motifs, $cand_motif;
#print "$cand_motif\n";
}
}
}
my $count = @enumerated_motifs;
print "Count = $count\n";
print Dumper \@enumerated_motifs;
Re: Trying to construct all possible strings from 4 bases [ATCG]
by Molt (Chaplain) on Feb 21, 2005 at 10:57 UTC
|
On the basis that you're more interested in getting the results rather than doing it yourself you may want to use the String::Combination module on CPAN which does what you need.
The following code seems to do the job..
#!/usr/bin/perl
use strict;
use warnings;
use String::Combination;
# Get all the combinations of bases which have a prescribed length.
my $length=2;
my @combinations=String::Combination::combination('atcg',$length);
# Output the results.
print "$_\n" for @combinations;
| [reply] [d/l] |
Re: Trying to construct all possible strings from 4 bases [ATCG]
by Taulmarill (Deacon) on Feb 21, 2005 at 10:52 UTC
|
this seemes to be a case for recursive programming. if i understood you correctly, this code should do what you want.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $l = 2; #Motif Length
my @nucs = qw/A T C G/;
my @enum = enum( $l, \@nucs );
print Dumper \@enum;
sub enum {
return @{$_[1]} unless --$_[0];
map { my $nuc = $_; map { $nuc . $_ } @{$_[1]} } enum( $_[0], $_[1
+]);
}
| [reply] [d/l] |
|
Hi Taumaril,
Thanks so much for answering me.
Using recursive along with map is way tooooo.. high for me.
Would you mind explain it with "for" loop instead?
As I don't have a strong background algorithmic way of thinking,
I would like to understand recursion in more basic way.
I hope this is not too much to ask ;-)
| [reply] |
|
hi,
it's not too mutch to ask. in fact i'm relay glad to explain some code, 'cause it shows, that you are willing to learn. so here it comes:
### my recursive subroutine
###########################
sub enum {
### return the list of elements (aka nucleotides) if this is the l
+ast
### recursion.
### also note that --$_[0] decrements the counter _before_ it is t
+ested
##################################################################
+#####
return @{$_[1]} unless --$_[0];
### define a variable which holds my return values
##################################################
my @return;
### iterate over the return values of the next recursion
########################################################
for ( enum( $_[0], $_[1] ) ){
my $nuc = $_;
### iterate over all nucleotides
################################
for ( @{$_[1]} ) {
push @return, $nuc . $_;
}
}
### return the list so far to the prior recursion
#################################################
return @return;
}
if you have problems to understand how it works, try to think about what happens, when $_[0] = 1 then think about what happens, when $_[0] = 2 and then you should get it.
if you don't understand this at all, it could be because my english is not even close to perfect. so maybe i just didn't explain it correct, which means "don't hesitate to ask questions" :-) | [reply] [d/l] [select] |
Re: Trying to construct all possible strings from 4 bases [ATCG]
by Aristotle (Chancellor) on Feb 21, 2005 at 10:54 UTC
|
You didn't say in what way the code is behaving differently from your expectation but it's a fairly simple problem to solve correctly; even without recursion.
#!/usr/bin/perl -w
use strict;
use warnings;
use Data::Dumper;
my $length = 2;
my @motif = ( '' );
for my $i ( 1 .. $length ) {
@motif = map {
my $motif = $_;
map $motif . $_, qw( a t c g );
} @motif;
}
print Dumper \@motif;
Ie we start out with a list with a single empty string and then on each round, we create four new strings for every string we already have.
(The shard-eyed will notice that this code does the same work as the recursive variant, only explicitly — well, and in different order.)
Makeshifts last the longest. | [reply] [d/l] |
Re: Trying to construct all possible strings from 4 bases [ATCG]
by gaal (Parson) on Feb 21, 2005 at 11:04 UTC
|
The approach I like for this kind of problem is to treat each string as a base-4 representation of a number. You just need a base-4 convertor (with this specific alphabet), and then your code is just an enumeration from 0 .. 4^L - 1.
Using Math::BaseCalc, here's a simple solution (untested):
use Math::BaseCalc;
$calc4 = new Math::BaseCalc(digits=>[qw/A T C G/]);
print($calc4->to_base($_), "\n") for 0 .. 4 ** $L - 1;
(You probably need to left-pad the string with 'A's to length L. Here 'A' maps to 0.)
Update: Good catch, ++jdalbec (fixed off-by-one bug). | [reply] [d/l] |
Re: Trying to construct all possible strings from 4 bases [ATCG] (use a closure to generate an iterator)
by grinder (Bishop) on Feb 21, 2005 at 11:58 UTC
|
Your solution, and the solutions offered so far, all appear to generate an array with the proposed results. This is okay, but will chew up memory the longer the string of bases. *
What you really want is a lazy iterator that will fetch the next sequence in the solution space. The following does what you want, at a greatly reduced cost in memory
#! /usr/local/bin/perl -w
use strict;
my @nuc = qw/ A T C G /;
sub inc {
$_[0] eq 'A' and return 'T';
$_[0] eq 'T' and return 'C';
return 'G';
}
sub gen {
my @current = @_;
my $done = 0;
return sub {
return if $done;
my @res = @current;
my $i;
ITER: for ($i = 0; $i < scalar @current; ++$i) {
if( $current[$i] ne $nuc[-1] ) {
$current[$i] = inc($current[$i]);
last;
}
else {
$current[$i] = $nuc[0];
}
}
$done = 1 if $i >= scalar @res;
return @res;
}
}
my @set = ('A') x (shift || 3);
my $iter = gen( @set );
while( my @iter = $iter->() ) {
print "@iter\n";
}
The inc kluge to move to the next base is a bit too ugly, there are better ways of doing this, but I leave that as an exercise to the reader.
If you are interested in learning more about iterators, I draw your attention to Dominus' forthcoming book, Higher Order Perl.
* I thought gaal's for 0 .. n constructed a list. It used to be the case on old perls, but gaal assures me this hasn't been the case in years. Which just goes to show that you learn something every day.
- another intruder with the mooring in the heart of the Perl
| [reply] [d/l] |
Re: Trying to construct all possible strings from 4 bases [ATCG]
by brian_d_foy (Abbot) on Feb 21, 2005 at 12:44 UTC
|
I wrote the Set::CrossProduct module to do this sort of thing. It doesn't do anything recursive, and you can get back the entire set of tuples all at once or one tuple at a time. Getting them one at a time is really handy when you end up generating thousands of tuples. :)
--
brian d foy <bdfoy@cpan.org>
| [reply] |
Re: Trying to construct all possible strings from 4 bases [ATCG]
by Chady (Priest) on Feb 21, 2005 at 13:49 UTC
|
okay, and here's another way to do it: not so efficient probably, but fun nevertheless
my $base = join ",", qw/A T C G/;
my $L = 4;
my $string = "{$base}" x $L;
my @combinations = glob $string;
print join ":", @combinations;
He who asks will be a fool for five minutes, but he who doesn't ask will
remain a fool for life.
Chady | http://chady.net/
| [reply] [d/l] |
Re: Trying to construct all possible strings from 4 bases [ATCG]
by Anonymous Monk on Feb 21, 2005 at 12:29 UTC
|
Nothing a good regex can't solve for you. No need for time consuming recursive approaches that blow up the stack. No need for off-by-one nested loops (are you sure you have enough of them?). All way to complicated. A regex can do all.
my $bases = "ATCG";
my $length = 4;
my $q = join ";", ($bases) x $length;
my $r = join ";", ("[^;]*([^;])[^;]*") x $length;
use re 'eval';
$q =~ /^$r$(?{print "$1$2$3$4\n"})(?!)/;
| [reply] [d/l] |
Re: Trying to construct all possible strings from 4 bases [ATCG] (Algorithm::Loops)
by tye (Sage) on Feb 21, 2005 at 17:14 UTC
|
use Algorithm::Loops qw( NestedLoops );
my $minLen= 2;
my $maxLen= 4;
my $iter= NestedLoops(
[ ( [qw( A T C G )] ) x $maxLen ],
{ OnlyWhen => sub { $minLen <= @_ && @_ <= $maxLen } },
);
my @bases;
while( @bases= $iter->() ) {
print @bases, $/;
}
| [reply] [d/l] |
Re: Trying to construct all possible strings from 4 bases [ATCG]
by jdalbec (Deacon) on Feb 21, 2005 at 15:46 UTC
|
What's wrong with my code below ?
You appear to be trying to use what I think of as a "meta-for loop" to get the effect of $l nested for loops. No programming language I know of implements this directly.
for ( my $i = 0 ; $i < $l ; $i++ ) {
foreach my $nucl (@nucleotides) {
substr( $cand_motif, $i, 1 ) = $nucl;
push @enumerated_motifs, $cand_motif;
#print "$cand_motif\n";
}
}
In a language that implemented the "meta-for" construct you could write something like
meta_for ( my $i = 0 ; $i < $l ; $i++ ) {
foreach my $nucl (@nucleotides) {
substr( $cand_motif, $i, 1 ) = $nucl;
} # meta_for
push @enumerated_motifs, $cand_motif;
#print "$cand_motif\n";
meta_end_for ( my $i = 0 ; $i < $l ; $i++ ) {
} # foreach
} # meta_end_for
Of course, parsing the above code as written presents serious difficulties. Probably a different syntax would be required.
Update:
Probably the best way to implement this (Update2: i.e. the "meta-for" construct) in Perl would be to use eval.
my $code = "xyzzy";
for ( my $i = 0 ; $i < $l ; $i++ ) {
$code =~ s/xyzzy/
foreach my \$nucl (\@nucleotides) {
substr( \$cand_motif, $i, 1 ) = \$nucl;
xyzzy
}
/;
}
$code =~ s/xyzzy/
push \@enumerated_motifs, \$cand_motif;
#print "\$cand_motif\\n";
/;
eval $code;
| [reply] [d/l] [select] |
|
|