Re: Match similar text

shadox,
Take a look at this node posted yesterday, though Text::Levenshtein is usually the standard answer.

I would do something like the following:

Set a maximum threshold, so if the closest match exceeded this threshold it would be set aside for human interaction

Iterate over each state calculating the similarity distance and select the shortest distance

Set aside for human interaction any match between two states that was close, perhaps only by a distance of 1

Write a log for changes until you feel confident/comfortable it is doing the right thing

Cheers - L~R

Comment on Re: Match similar text

Replies are listed 'Best First'.
Re: Re: Match similar text by exussum0 (Vicar) on Sep 07, 2003 at 00:24 UTC
In conjunction w/ that, the person might want to get all distinct mispelled states and update them at once. If his DB is big, say 6mil rows, it'd be 6mil selects, and 1 update for every misspelled row, vs 1 select on a 6 million row table and 1 update for each misspelling. Maybe the person already does this, but might as well be obvious :) -- Play that funky music white boy..	[reply]