Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: diff of two strings

by sfink (Deacon)
on Jan 09, 2008 at 08:29 UTC ( [id://661285]=note: print w/replies, xml ) Need Help??


in reply to diff of two strings

As described, I don't think there is a unique solution. Do you just want any valid solution, or one with some sort of minimum cost, or what?

Consider (I'll use single letters to stand in for words): ABA vs BAA: <A>BA,BA<A>? Or [A][B]A,[B][A]A?

I'm guessing you don't care too much, as long as there isn't an easy way to remove markup from the result (i.e., it's ok to return a local minimum even if it isn't the global minimum).

For your algorithm above, I think what you really want is a suffix tree (of words) instead of the dynamic programming algorithm, but beyond that, my brain is too sleepy to be of much use. You'd find the LCSS, remove it from the suffix tree in some sense, and then I think you'd be able to immediately iterate to find the remaining LCSSes until you run out. I think the removal and all other operations should be pretty quick.

Without a suffix tree... how slow would it be to run your existing LCSS algorithm, then replace the found LCSS with a different dummy token in each string and repeat? Or is that what you're already doing?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://661285]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (3)
As of 2024-04-19 20:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found