Problems? Is your data what you think it is? | |
PerlMonks |
Re: Seeking algorithm for finding common continous sub-patternsby davidj (Priest) |
on Dec 03, 2004 at 08:47 UTC ( [id://412066]=note: print w/replies, xml ) | Need Help?? |
Johnnywang, If I understand the problem correctly, then from an algorithmic standpoint, what you are looking for is unrealistic, at least to be solved in any reasonable amount of time. The problem is this: the only way to guarantee finding every common continuous subsequence of any length is to do it by brute force. You have to check every subsequence from size m up to the size of the smallest sequence. In the example you give, you have to check every subsequence from size 2 to size 5. I did a quick math check (so the exact number might be wrong), but the number of check would be in the neighborhood of 600. And this is with only 3 sequences of sizes 11, 11, and 5. Just adding 1 number to each of the 3 sequences would add hundreds of more required checks. The growth of the problem with every additional sequence is polynomial. For you to want to check 1000's of sequences that may be 1000's of numbers long would require in the billions of required checks which would take much longer than I'm sure you are willing to accept. This is one of those NP problems that CS majors learn about in algorithm classes (traveling salesman, coloring problem, hamiltonian circuit problem, etc). If someone can come up with an algorithm that is not brute force, thus making the problem not NP, I would love to see it. davidj
In Section
Seekers of Perl Wisdom
|
|