Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Match similar text

by Limbic~Region (Chancellor)
on Sep 06, 2003 at 23:53 UTC ( #289520=note: print w/replies, xml ) Need Help??


in reply to Match similar text

shadox,
Take a look at this node posted yesterday, though Text::Levenshtein is usually the standard answer.

I would do something like the following:

  • Set a maximum threshold, so if the closest match exceeded this threshold it would be set aside for human interaction
  • Iterate over each state calculating the similarity distance and select the shortest distance
  • Set aside for human interaction any match between two states that was close, perhaps only by a distance of 1
  • Write a log for changes until you feel confident/comfortable it is doing the right thing

    Cheers - L~R

  • Replies are listed 'Best First'.
    Re: Re: Match similar text
    by exussum0 (Vicar) on Sep 07, 2003 at 00:24 UTC
      In conjunction w/ that, the person might want to get all distinct mispelled states and update them at once. If his DB is big, say 6mil rows, it'd be 6mil selects, and 1 update for every misspelled row, vs 1 select on a 6 million row table and 1 update for each misspelling.

      Maybe the person already does this, but might as well be obvious :)
      --
      Play that funky music white boy..

    Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Domain Nodelet?
    Node Status?
    node history
    Node Type: note [id://289520]
    help
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others cooling their heels in the Monastery: (3)
    As of 2021-11-27 03:37 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found

      Notices?