The data I am working with are DNA sequences. (I know of the Bioperl project but have found nothing there that could help.) I have distilled the data to the DNA sequences alone. I have a second file with the sequence IDs, which is why I also print out the indicies from the matching strings. The data look like this.
ATGGAGAACATCACATCAGGACTCCTAGGACCCCTTCTCGTGTTACAGGC
ATGGAGAACATCACATCACGACTCCTAGGACCCCTTCACGTGAAACAGGC
ATGCTCAACGTCACATCAGGACTCCTAGGACCACGTCTCGTGTTACAGGG
ATGGTGTACATCACGACAGGATTCCTCGGAATCGCGCTGGTGACACAGGC
With the sequence IDs the data would look like this.
>seq1
ATGGAGAACATCACATCAGGACTCCTAGGACCCCTTCTCGTGTTACAGGC
>seq2
ATGGAGAACATCACATCACGACTCCTAGGACCCCTTCACGTGAAACAGGC
>seq3
ATGCTCAACGTCACATCAGGACTCCTAGGACCACGTCTCGTGTTACAGGG
>seq4
ATGGTGTACATCACGACAGGATTCCTCGGAATCGCGCTGGTGACACAGGC
### update ###
I have placed real data in my public scratchpad.
| [reply] |
So, just to see if I understand the task... given those four (truncated?) lines of input data, would the following be the "right" answer?
LCS for 0 :: 1 = |ATGGAGAACATCACATCA|
LCS for 0 :: 2 = |TCACATCAGGACTCCTAGGACC|
LCS for 0 :: 3 = |CATCAC|
LCS for 1 :: 2 = |ACTCCTAGGACC|
LCS for 1 :: 3 = |CATCAC|
LCS for 2 :: 3 = |CAGGA|
This doesn't keep track of the actual index offsets where the longest match actually starts in each string for each pairwise comparison, but that would be easy to add.
That's the output from the code posted in my later reply in this thread, given those four lines of sample data as input. | [reply] [d/l] |
0 :: 1 ATGGAGAACATCACATCA and GACTCCTAGGACCCCTTC
0 :: 2 TCACATCAGGACTCCTAGGACC
0 :: 3 ACATCAC
1 :: 2 GACTCCTAGGACC
1 :: 3 ACATCAC
2 :: 3 CAGGA and ACAGG
| [reply] [d/l] |
0 :: 1 ATGGAGAACATCACATCA and GACTCCTAGGACCCCTTC
0 :: 2 TCACATCAGGACTCCTAGGACC
0 :: 3 ACATCAC
1 :: 2 GACTCCTAGGACC
1 :: 3 ACATCAC
2 :: 3 CAGGA and ACAGG
| [reply] [d/l] |
484593-3
Best match len:22 betwixt 0:10 & 2:10
ATGGAGAACATCACATCAGGACTCCTAGGACCCCTTCTCGTGTTACAGGC
TCACATCAGGACTCCTAGGACC
ATGCTCAACGTCACATCAGGACTCCTAGGACCACGTCTCGTGTTACAGGG
TCACATCAGGACTCCTAGGACC
6 trials of N:4 L:50 MIN:2 ( 4.098ms total), 682us/trial
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
| [reply] [d/l] |