Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Re: diff of two strings

by sfink (Deacon)
on Jan 09, 2008 at 08:29 UTC ( #661285=note: print w/replies, xml ) Need Help??

in reply to diff of two strings

As described, I don't think there is a unique solution. Do you just want any valid solution, or one with some sort of minimum cost, or what?

Consider (I'll use single letters to stand in for words): ABA vs BAA: <A>BA,BA<A>? Or [A][B]A,[B][A]A?

I'm guessing you don't care too much, as long as there isn't an easy way to remove markup from the result (i.e., it's ok to return a local minimum even if it isn't the global minimum).

For your algorithm above, I think what you really want is a suffix tree (of words) instead of the dynamic programming algorithm, but beyond that, my brain is too sleepy to be of much use. You'd find the LCSS, remove it from the suffix tree in some sense, and then I think you'd be able to immediately iterate to find the remaining LCSSes until you run out. I think the removal and all other operations should be pretty quick.

Without a suffix tree... how slow would it be to run your existing LCSS algorithm, then replace the found LCSS with a different dummy token in each string and repeat? Or is that what you're already doing?

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://661285]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2022-08-18 01:55 GMT
Find Nodes?
    Voting Booth?

    No recent polls found