baxy77bax has asked for the wisdom of the Perl Monks concerning the following question:
Hi,
A question. Actually 2. Given a weight matrix :
id 0 1 2 3 4 5
|-------------------
0| 4,-1,-2,-2, 0,-1
1| -1, 5, 0, 2,-3, 1
2| -2, 0, 6, 1,-3, 5
3| -2, 2, 1, 7,-3, 0
4| 0,-3,-3,-3, 8,-3
5| -1, 1, 5, 0,-3, 9
I wish to extract all combinations of k-mers (k-id segments) which have a total score larger than let say 17.
let k = 3 then
3,4,5 -> score 7+8+9 = 24 > 17
0,1,2 -> score 4+5+6 = 15 < 17
and all those simmilar enough with the same criterion. wher similar means ler a k-mer be 2,3,4 -> 6+7+8 = 21 and similar to that one is 5,3,4
2->5 = 5
3->3 = 7
4->4 = 8
so 2,3,4 -> 5,3,4 = 5+7+8 = 20 > 17
so positional similarity is only relevant.
the naive approach for k = 3 would be :
#!/usr/bin/perl
use strict;
my $MTX = [
[4,-1,-2,-2, 0,-1],
[-1,5, 0, 2,-3, 1],
[-2,0, 6, 1,-3, 5],
[-2,2, 1, 7,-3, 0],
[0,-3,-3,-3, 8,-3],
[-1,1, 5, 0,-3, 9],
];
for (my $i =0; $i<6; $i++){
for(my $j = 0; $j< 6; $j++){
for(my $k = 0; $k< 6; $k++){
my $sc = ($MTX->[$i]->[$i] + $MTX->[$j]->[$j] + $MTX->[$k]->[
+$k]);
if ($sc < 17){
next;
}else{
for (my $ii =0; $ii<6; $ii++){
for(my $jj = 0; $jj< 6; $jj++){
for(my $kk = 0; $kk< 6; $kk++){
my $score = ($MTX->[$i]->[$ii] + $MTX->[$j]->[$jj
+] + $MTX->[$k]->[$kk]);
if($score > 17){
print " $i,$j,$k -> $ii,$jj,$kk = $score\n";
}
}
}
}
}
}
}
}
also the rule is : if any id is changed, the score for a mismatch (2->5) cannot be greated than (2->2):
score(2->5) < score(2->2)
and the other qouestion is: what would be the best way to generalize this to any k, not just 3 as in my example above?
am I making any sense ?
thnx
PS: code not necessary.
thank you
UPDATE:
Ok , after thinking a bit about it. i guess the answer to the secong question is straight forward. sinc the alphabet size is constant (6) each k-mer can be encoded onto a unique position from 1 to 6^k. Thus the first three for-loops can be replaced by a singe one. (the sam goes for the second three) and therefore the loop problem dissapears and iterations can easaly be generalized to any k size k-mer. Which leaves the first question still open..
UPDATE:
To elaborate on my previous Update: this is what i ment :
my $k = 3;
for (my $i =0; $i<6**$k; $i++){
my $t = &decode($i, 6,$k);
print "@{$t}\n";
}
sub decode{
my ($int, $alpha, $k) = @_;
my $r = 0;
my $e = $int;
my $rq = 0;
my $rr = 0;
my $p = 0;
my @aa;
my $j = $k-1;
my $i = 0;
for($i=0; $i < $k-1 ;$i++, $j--){
$p = $alpha**$j;
$rr = $e%$p;
$rq = $e/$p;
$aa[$i] = int($rq);
$e = $rr;
}
$aa[$i] = $e;
return \@aa;
}
=pod
## for starting with 1 instead of 0
sub decode{
my ($int, $alpha, $k) = @_;
my $r = 0;
my $e = $int;
my $rq = 0;
my $rr = 0;
my $p = 0;
my @aa;
my $j = $k-1;
my $i = 0;
for($i=0; $i < $k-1 ;$i++, $j--){
$p = $alpha**$j;
$rr = $e%$p;
$rq = $e/$p;
if($rr == 0){
$rr = ($e-1)%$p;
$rq = ($e-1)/$p;
$r = 1;
}
$aa[$i] = int($rq);
$e = $rr;
}
$aa[$i] = $e+$r-1;
return \@aa;
}
=cut
Yes ofcourse this can be optimized and done in more perlish way. but this is just a conceptual problem requiring conceptual solution :) thnx
Re: Combinatorial problem (Algorithm::Loops)
by tye (Sage) on Apr 23, 2015 at 20:06 UTC
|
When wanting to nest for loops arbitrary levels deep, I use Algorithm::Loops::NestedLoops().
| [reply] [Watch: Dir/Any] [d/l] |
Re: Combinatorial problem
by Laurent_R (Canon) on Apr 23, 2015 at 17:43 UTC
|
I do not know what a k-mer is, so I may get your explanations wrong, but it seems to me that you're only using the diagonal of your matrix. Why don't you extract that diagonal into an array and the search will be easier and faster. Or maybe I did not understand what you are actually doing.
As a side note, this syntax:
for (my $i =0; $i<6; $i++){
could be replaced by something more perlish:
for my $i (0..5) {
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Hi, thnx for the reply. A k-mer or k-plet, or k-tuple ...(there are meny deiierent ways people call those). Basically (generaly speaking) it is a k-size constrict of individual elements. these elements are often chosen to be charracters (or letters, or symbols) from an alphabet ... depending upon the literature and the field of interest. Linguists and biologists will usually use the term k-mer (a k character string) (as far as my literature knowlage goes )
As far as your post goes. You perfectly understood the first part of the problem. I want to find all k-mers with score values above 17. And yes I could extract the diagonal and do everything faster. But I need the matrix because then in the second 3 for-loops I want to find all k-mers that , when transhormed into the initial high scoring k-mer the transformation score will not be below 17 (again). that way I am certain that all k-mers with score above 17 are found
the problem is if the k and the alfhabet (6) gets really big, itterating through all possible solutions (like I'm doing) is not feasible. so I'm trying to locate some type of heuristic solution to reduce the number of iterations to minimum .
thnx once more for a suggestiog
| [reply] [Watch: Dir/Any] |
Re: Combinatorial problem
by Laurent_R (Canon) on Apr 23, 2015 at 21:41 UTC
|
Hm, not sure to understand everything that you are doing, but this is my proposed solution:
use strict;
use warnings;
use Data::Dumper;
my $MTX = [
[4,-1,-2,-2, 0,-1],
[-1,5, 0, 2,-3, 1],
[-2,0, 6, 1,-3, 5],
[-2,2, 1, 7,-3, 0],
[0,-3,-3,-3, 8,-3],
[-1,1, 5, 0,-3, 9],
];
my (@diagonal, @result);
my $k = 3;
for my $i (0..5) {
push @diagonal, $MTX->[$i][$i];
}
for my $i (0..5) {
for my $j (0..5) {
for my $k (0..5) {
next if $i == $j or $i == $k or $j == $k;
my $sc;
$sc += $diagonal[$_] for ($i, $j, $k);
next unless defined $sc;
next if $sc < 17;
push @{$result[$sc]}, [$i, $j, $k];
}
}
}
print Dumper \@result;
The result is as follows:
I leave it up to you to check the results.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
use strict;
use warnings;
my $MTX = [
[4,-1,-2,-2, 0,-1],
[-1,5, 0, 2,-3, 1],
[-2,0, 6, 1,-3, 5],
[-2,2, 1, 7,-3, 0],
[0,-3,-3,-3, 8,-3],
[-1,1, 5, 0,-3, 9],
];
my @result;
my @diagonal = map { $MTX->[$_][$_] } 0..5;
for my $i (0..5) {
for my $j ($i+1..5) {
for my $k ($j+1..5) {
my $sc = 0;
$sc += $diagonal[$_] for ($i, $j, $k);
next unless $sc > 17;
push @result, [$i, $j, $k, $sc];
}
}
}
print "@$_\n" for @result;
| [reply] [Watch: Dir/Any] [d/l] |
|
You are creating a lot of duplicates.
Yes, I know, but I had the feeling the OP wanted to keep the duplicates. Well, I still don't know what the OP really wanted, I just tried to offer something simpler.
| [reply] [Watch: Dir/Any] |
Re: Combinatorial problem
by hdb (Monsignor) on Apr 23, 2015 at 21:32 UTC
|
From your description and from your code it is not clear to me whether or not duplicates are allowed but if not check this thread how to chose k out of n elements in turn. Then all you need is to calculate the sum of all combinations.
| [reply] [Watch: Dir/Any] |
Re: Combinatorial problem
by hdb (Monsignor) on Apr 24, 2015 at 11:31 UTC
|
Here is a recursive solution. It uses the diagonal sorted in descending order but keeps track of the original order. It stops if the remaining elements become too small. I have to admit that I only looked at your one test case and that I did not do any benchmarking.
use strict;
use warnings;
use List::Util qw/ sum min /;
sub find {
my( $array, $n, $k, $threshold, $start, $solution, $result ) = @_;
return if sum( @{$array}[$start..min($start+$k-1,$n-1)] ) <= $thre
+shold; # not enough meat anymore
if( 1==$k ) {
push @$result, [ @$solution, $array->[$_][1] ] for grep { $arr
+ay->[$_][0] > $threshold } $start..$n-1;
} else {
find( $array, $n, $k-1, $threshold-$array->[$_][0], $_+1, [ @$
+solution, $array->[$_][1] ], $result ) for $start..$n-$k;
}
}
my $MTX = [
[4,-1,-2,-2, 0,-1],
[-1,5, 0, 2,-3, 1],
[-2,0, 6, 1,-3, 5],
[-2,2, 1, 7,-3, 0],
[0,-3,-3,-3, 8,-3],
[-1,1, 5, 0,-3, 9],
];
my $k = 3;
my $threshold = 17;
my $n = @$MTX;
my @diag = sort {$b->[0]<=>$a->[0]} map { [$MTX->[$_][$_],$_] } 0..$n-
+1;
my @result;
find( \@diag, $n, $k, $threshold, 0, [], \@result );
print "@$_: ".sum(map{$MTX->[$_][$_]}@$_)."\n" for @result;
Update: Minor improvement. The line
push @$result, [ @$solution, $array->[$_][1] ] for grep { $array->[$_][0] > $threshold } $start..$n-1;
can be replaced with
push @$result, [ @$solution, $array->[$start++][1] ] while $start<$n and $array->[$start][0] > $threshold;
to avoid a few tests. | [reply] [Watch: Dir/Any] [d/l] [select] |
|
|