Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Re: Match similar text

by Limbic~Region (Chancellor)
on Sep 06, 2003 at 23:53 UTC ( #289520=note: print w/replies, xml ) Need Help??

in reply to Match similar text

Take a look at this node posted yesterday, though Text::Levenshtein is usually the standard answer.

I would do something like the following:

  • Set a maximum threshold, so if the closest match exceeded this threshold it would be set aside for human interaction
  • Iterate over each state calculating the similarity distance and select the shortest distance
  • Set aside for human interaction any match between two states that was close, perhaps only by a distance of 1
  • Write a log for changes until you feel confident/comfortable it is doing the right thing

    Cheers - L~R

  • Replies are listed 'Best First'.
    Re: Re: Match similar text
    by exussum0 (Vicar) on Sep 07, 2003 at 00:24 UTC
      In conjunction w/ that, the person might want to get all distinct mispelled states and update them at once. If his DB is big, say 6mil rows, it'd be 6mil selects, and 1 update for every misspelled row, vs 1 select on a 6 million row table and 1 update for each misspelling.

      Maybe the person already does this, but might as well be obvious :)
      Play that funky music white boy..

    Log In?

    What's my password?
    Create A New User
    Domain Nodelet?
    Node Status?
    node history
    Node Type: note [id://289520]
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others cooling their heels in the Monastery: (3)
    As of 2021-11-27 03:37 GMT
    Find Nodes?
      Voting Booth?

      No recent polls found