Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Combinatorial problem

by baxy77bax (Deacon)
on Apr 23, 2015 at 16:15 UTC ( [id://1124429]=perlquestion: print w/replies, xml ) Need Help??

baxy77bax has asked for the wisdom of the Perl Monks concerning the following question:

Hi, A question. Actually 2. Given a weight matrix :
id 0 1 2 3 4 5 |------------------- 0| 4,-1,-2,-2, 0,-1 1| -1, 5, 0, 2,-3, 1 2| -2, 0, 6, 1,-3, 5 3| -2, 2, 1, 7,-3, 0 4| 0,-3,-3,-3, 8,-3 5| -1, 1, 5, 0,-3, 9
I wish to extract all combinations of k-mers (k-id segments) which have a total score larger than let say 17.
let k = 3 then 3,4,5 -> score 7+8+9 = 24 > 17 0,1,2 -> score 4+5+6 = 15 < 17
and all those simmilar enough with the same criterion. wher similar means ler a k-mer be 2,3,4 -> 6+7+8 = 21 and similar to that one is 5,3,4
2->5 = 5 3->3 = 7 4->4 = 8 so 2,3,4 -> 5,3,4 = 5+7+8 = 20 > 17
so positional similarity is only relevant. the naive approach for k = 3 would be :
#!/usr/bin/perl use strict; my $MTX = [ [4,-1,-2,-2, 0,-1], [-1,5, 0, 2,-3, 1], [-2,0, 6, 1,-3, 5], [-2,2, 1, 7,-3, 0], [0,-3,-3,-3, 8,-3], [-1,1, 5, 0,-3, 9], ]; for (my $i =0; $i<6; $i++){ for(my $j = 0; $j< 6; $j++){ for(my $k = 0; $k< 6; $k++){ my $sc = ($MTX->[$i]->[$i] + $MTX->[$j]->[$j] + $MTX->[$k]->[ +$k]); if ($sc < 17){ next; }else{ for (my $ii =0; $ii<6; $ii++){ for(my $jj = 0; $jj< 6; $jj++){ for(my $kk = 0; $kk< 6; $kk++){ my $score = ($MTX->[$i]->[$ii] + $MTX->[$j]->[$jj +] + $MTX->[$k]->[$kk]); if($score > 17){ print " $i,$j,$k -> $ii,$jj,$kk = $score\n"; } } } } } } } }
also the rule is : if any id is changed, the score for a mismatch (2->5) cannot be greated than (2->2):
score(2->5) < score(2->2)
and the other qouestion is: what would be the best way to generalize this to any k, not just 3 as in my example above?
am I making any sense ?

thnx PS: code not necessary.

thank you

UPDATE:

Ok , after thinking a bit about it. i guess the answer to the secong question is straight forward. sinc the alphabet size is constant (6) each k-mer can be encoded onto a unique position from 1 to 6^k. Thus the first three for-loops can be replaced by a singe one. (the sam goes for the second three) and therefore the loop problem dissapears and iterations can easaly be generalized to any k size k-mer. Which leaves the first question still open.. UPDATE: To elaborate on my previous Update: this is what i ment :

my $k = 3; for (my $i =0; $i<6**$k; $i++){ my $t = &decode($i, 6,$k); print "@{$t}\n"; } sub decode{ my ($int, $alpha, $k) = @_; my $r = 0; my $e = $int; my $rq = 0; my $rr = 0; my $p = 0; my @aa; my $j = $k-1; my $i = 0; for($i=0; $i < $k-1 ;$i++, $j--){ $p = $alpha**$j; $rr = $e%$p; $rq = $e/$p; $aa[$i] = int($rq); $e = $rr; } $aa[$i] = $e; return \@aa; } =pod ## for starting with 1 instead of 0 sub decode{ my ($int, $alpha, $k) = @_; my $r = 0; my $e = $int; my $rq = 0; my $rr = 0; my $p = 0; my @aa; my $j = $k-1; my $i = 0; for($i=0; $i < $k-1 ;$i++, $j--){ $p = $alpha**$j; $rr = $e%$p; $rq = $e/$p; if($rr == 0){ $rr = ($e-1)%$p; $rq = ($e-1)/$p; $r = 1; } $aa[$i] = int($rq); $e = $rr; } $aa[$i] = $e+$r-1; return \@aa; } =cut
Yes ofcourse this can be optimized and done in more perlish way. but this is just a conceptual problem requiring conceptual solution :) thnx

Replies are listed 'Best First'.
Re: Combinatorial problem (Algorithm::Loops)
by tye (Sage) on Apr 23, 2015 at 20:06 UTC

    When wanting to nest for loops arbitrary levels deep, I use Algorithm::Loops::NestedLoops().

    - tye        

Re: Combinatorial problem
by Laurent_R (Canon) on Apr 23, 2015 at 17:43 UTC
    I do not know what a k-mer is, so I may get your explanations wrong, but it seems to me that you're only using the diagonal of your matrix. Why don't you extract that diagonal into an array and the search will be easier and faster. Or maybe I did not understand what you are actually doing.

    As a side note, this syntax:

    for (my $i =0; $i<6; $i++){
    could be replaced by something more perlish:
    for my $i (0..5) {

    Je suis Charlie.
      Hi,

      thnx for the reply. A k-mer or k-plet, or k-tuple ...(there are meny deiierent ways people call those). Basically (generaly speaking) it is a k-size constrict of individual elements. these elements are often chosen to be charracters (or letters, or symbols) from an alphabet ... depending upon the literature and the field of interest. Linguists and biologists will usually use the term k-mer (a k character string) (as far as my literature knowlage goes )

      As far as your post goes. You perfectly understood the first part of the problem. I want to find all k-mers with score values above 17. And yes I could extract the diagonal and do everything faster. But I need the matrix because then in the second 3 for-loops I want to find all k-mers that , when transhormed into the initial high scoring k-mer the transformation score will not be below 17 (again). that way I am certain that all k-mers with score above 17 are found

      the problem is if the k and the alfhabet (6) gets really big, itterating through all possible solutions (like I'm doing) is not feasible. so I'm trying to locate some type of heuristic solution to reduce the number of iterations to minimum .

      thnx once more for a suggestiog

Re: Combinatorial problem
by Laurent_R (Canon) on Apr 23, 2015 at 21:41 UTC
    Hm, not sure to understand everything that you are doing, but this is my proposed solution:
    use strict; use warnings; use Data::Dumper; my $MTX = [ [4,-1,-2,-2, 0,-1], [-1,5, 0, 2,-3, 1], [-2,0, 6, 1,-3, 5], [-2,2, 1, 7,-3, 0], [0,-3,-3,-3, 8,-3], [-1,1, 5, 0,-3, 9], ]; my (@diagonal, @result); my $k = 3; for my $i (0..5) { push @diagonal, $MTX->[$i][$i]; } for my $i (0..5) { for my $j (0..5) { for my $k (0..5) { next if $i == $j or $i == $k or $j == $k; my $sc; $sc += $diagonal[$_] for ($i, $j, $k); next unless defined $sc; next if $sc < 17; push @{$result[$sc]}, [$i, $j, $k]; } } } print Dumper \@result;
    The result is as follows: I leave it up to you to check the results.

    Je suis Charlie.

      You are creating a lot of duplicates. May I offer some humble adjustments:

      use strict; use warnings; my $MTX = [ [4,-1,-2,-2, 0,-1], [-1,5, 0, 2,-3, 1], [-2,0, 6, 1,-3, 5], [-2,2, 1, 7,-3, 0], [0,-3,-3,-3, 8,-3], [-1,1, 5, 0,-3, 9], ]; my @result; my @diagonal = map { $MTX->[$_][$_] } 0..5; for my $i (0..5) { for my $j ($i+1..5) { for my $k ($j+1..5) { my $sc = 0; $sc += $diagonal[$_] for ($i, $j, $k); next unless $sc > 17; push @result, [$i, $j, $k, $sc]; } } } print "@$_\n" for @result;
        You are creating a lot of duplicates.
        Yes, I know, but I had the feeling the OP wanted to keep the duplicates. Well, I still don't know what the OP really wanted, I just tried to offer something simpler.

        Je suis Charlie.
Re: Combinatorial problem
by hdb (Monsignor) on Apr 23, 2015 at 21:32 UTC

    From your description and from your code it is not clear to me whether or not duplicates are allowed but if not check this thread how to chose k out of n elements in turn. Then all you need is to calculate the sum of all combinations.

Re: Combinatorial problem
by hdb (Monsignor) on Apr 24, 2015 at 11:31 UTC

    Here is a recursive solution. It uses the diagonal sorted in descending order but keeps track of the original order. It stops if the remaining elements become too small. I have to admit that I only looked at your one test case and that I did not do any benchmarking.

    use strict; use warnings; use List::Util qw/ sum min /; sub find { my( $array, $n, $k, $threshold, $start, $solution, $result ) = @_; return if sum( @{$array}[$start..min($start+$k-1,$n-1)] ) <= $thre +shold; # not enough meat anymore if( 1==$k ) { push @$result, [ @$solution, $array->[$_][1] ] for grep { $arr +ay->[$_][0] > $threshold } $start..$n-1; } else { find( $array, $n, $k-1, $threshold-$array->[$_][0], $_+1, [ @$ +solution, $array->[$_][1] ], $result ) for $start..$n-$k; } } my $MTX = [ [4,-1,-2,-2, 0,-1], [-1,5, 0, 2,-3, 1], [-2,0, 6, 1,-3, 5], [-2,2, 1, 7,-3, 0], [0,-3,-3,-3, 8,-3], [-1,1, 5, 0,-3, 9], ]; my $k = 3; my $threshold = 17; my $n = @$MTX; my @diag = sort {$b->[0]<=>$a->[0]} map { [$MTX->[$_][$_],$_] } 0..$n- +1; my @result; find( \@diag, $n, $k, $threshold, 0, [], \@result ); print "@$_: ".sum(map{$MTX->[$_][$_]}@$_)."\n" for @result;

    Update: Minor improvement. The line

    push @$result, [ @$solution, $array->[$_][1] ] for grep { $array->[$_][0] > $threshold } $start..$n-1;

    can be replaced with

    push @$result, [ @$solution, $array->[$start++][1] ] while $start<$n and $array->[$start][0] > $threshold;

    to avoid a few tests.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1124429]
Approved by herveus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2024-03-29 08:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found