Your skill will accomplishwhat the force of many cannot PerlMonks

### comment on

 Need Help??

The longest common substring algorithm on that page is (m * n) time but requires (m * n) space as well. Contrast to the naive solution, which is (m * m * n) time but (m + n) memory.

That said, since the OP has strings of length 3000 characters, we're looking at only 3000 * 3000 * sizeof(uint16_t) = 18 megabytes of space. If the strings were, say, 100k each, we'd have problems.

So, the name of this site notwithstanding, here's some C code, poorly tested

```/* Longest Common Substring
*
* C Implementation of the algorithm found at:
* http://www.ics.uci.edu/~dan/class/161/notes/6/Dynamic.html
*/

/* This section is reproduced from the URL above.
*
* California law forbids anyone from making use of class
* lecture notes for commercial purposes, and such activity
* is expressly forbidden without the instructor's written
* permission.
*

Longest common substring problem

Given two sequences of letters, such as A = HELLO and
B = ALOHA, find the longest contiguous sequence appearing
in both.

One solution:   (assume strings have lengths m and n)
For each of the m starting points of A, check for the
longest common string starting at each of the n starting
points of B.

The checks could average (m) time  a total of (m2n) time.

Dynamic programming solution:
Let Li, j = maximum length of common strings that end at
A[i] & B[j].
Then,

A[i] = B[j]  Li, j = 1 + Li-1, j-1
A[i]  B[j]  Li, j = 0

LONGEST COMMON SUBSTRING(A,m,B,n)
for i := 0 to m do Li,0 := 0
for j := 0 to n do L0,j := 0
len := 0
for i := 1 to m do
for j := 1 to n do
if Ai  Bj then
Li,j := 0
else
Li,j := 1 + Li-1,j-1
if Li,j > len then
len := Li,j

Example:

A L O H A

H  0 0 0 1 0
E  0 0 0 0 0
L  0 1 0 0 0
L  0 1 0 0 0
O  0 0 2 0 0
*
*/

/* A type big enough to hold the string length. */
typedef long LCS_t;

/* A type that can hold any element of the string. */
typedef char LCS_c;

/* A should be a pointer to the beginning of the string.
* m should be the index of the last character in the
* string, plus one.
* Likewise for B and n.
* scratch must be an array of at least size m * n.
*/

LCS_t longest_common_substring
( LCS_c *A, LCS_t m,
LCS_c *B, LCS_t n,
LCS_t *scratch,
)
{
LCS_t i, j;

LCS_t len = 0;

#define L(x, y) scratch[(x) * n + (y)]

for (i = 0; i < m; i++) L(i, 0) = 0;
for (j = 0; j < n; j++) L(0, j) = 0;

for (i = 0; i < m; i++)
{
for (j = 0; j < n; j++)
{
if (A[i] != B[j])
{
L(i, j) = 0;
}
else
{
L(i, j) = 1 + L(i - 1, j - 1);
if (L(i, j) > len)
{
len = L(i, j);
}
}
}
}

#undef L

return len;
}

#include <malloc.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char **argv)
{
if (argc != 3) return 2;

char *A = argv[1];
char *B = argv[2];

long m = strlen(A);
long n = strlen(B);

long *scratch = (long *) malloc(m * n * sizeof(long));
if (!scratch) return 1;

long len;

len = longest_common_substring(A, m, B, n, scratch,
free(scratch);

printf("LCS is %ld chars ending at <%ld,%ld>.\n",
return 0;
}

Maybe this would be a good time for me to learn Inline::C.

In reply to Re^2: Search for identical substrings by TilRMan
in thread Search for identical substrings by bioMan

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

• Are you posting in the right place? Check out Where do I post X? to know for sure.
• Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
• Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
• Want more info? How to link or How to display code and escape characters are good places to start.

Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2024-04-15 05:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?

No recent polls found