Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Re: finding longest common substring (ALL common substrings)

by revdiablo (Prior)
on Nov 20, 2003 at 17:55 UTC ( [id://308630]=note: print w/replies, xml ) Need Help??


in reply to Re: finding longest common substring (ALL common substrings)
in thread finding longest common substring

This is indeed interesting. With my test data, it's actually a bit faster than my original version (though we've seen how much different data will affect the various algorithms). Pretty impressive, considering what it does. There appears to be a problem, however. It returns undef if you feed it qw(foo bor boz bzo), but works fine with qw(foo boor booz bzoo) and qw(fo bor boz bzo). So if there are any mismatching number of o's, it returns undef. I don't see why offhand; maybe you have some ideas?

Replies are listed 'Best First'.
Re: Re: Re: finding longest common substring (ALL common substrings)
by BrowserUk (Patriarch) on Nov 20, 2003 at 22:29 UTC

    Sorry. The code is flawed. It does produce all the common substrings, but it will often select the wrong "longest".

    The problem occurs because if a substring occurs twice in one of the input strings, and not at all in one of the others, it's count will be the same as if it had appeared once in both, The selection mechanism, the longest key who's count is equal to the number of input strings is bogus, but suffuciently convincing that it worked for all 5 sets of test data I tried it on!

    I'm trying to think of an efficient way of counting how many of the original strings each substring is found in, but the only one I've come up with so far would limit the number of input strings to 32. A couple of other ideas I tried worked, but carry enough overhead to make the method less interesting.

    I'll keep looking at it, but maybe my "surprise at the simplicity and efficiency" was the red flag that should have told me that I was missing something! Still, nothing ventured, nothing gained.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!
    Wanted!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://308630]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (5)
As of 2024-04-19 23:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found