Problems? Is your data what you think it is? PerlMonks

measuring the similarity between strings

by mayaTheCat (Scribe)
 on Sep 05, 2003 at 10:10 UTC Need Help??
 Description: hi monks, here comes another implementation of the Levenshtein Distance; a distance metric which measures the similarities of strings. Info and some other implementations can be found at Levenshtein distance: calculating similarity of strings. At the moment I need to compare two sets of strings where each set consists of approximately 2000 strings, which means approximataley 3-4 million measurements. Thus I wrote this code, in which I tried to minimize the memory usage, to make it more efficient.
```sub ldist {
my @s = split //, shift;
my @t = split //, shift;

return scalar @t if scalar @s == 0;
return scalar @s if scalar @t == 0;

my (@prevColumn, @currColumn);

@prevColumn = 0..scalar(@t);

for my \$s (0..\$#s) {
@currColumn = ( \$s + 1 );
for my \$t (0..\$#t) {
push @currColumn, min(
\$currColumn[\$t] + 1
, \$prevColumn[\$t+1] + 1
, \$prevColumn[\$t] + (\$s[\$s] eq \$t[\$t] ? 0 : 1)
);
}
@prevColumn = @currColumn;
}

pop @currColumn
}

sub min {
my \$min = shift;

for (@_) {
\$min = \$_ if \$_ < \$min;
}

\$min;
}
```
Replies are listed 'Best First'.
Re: measuring the similarity between strings
by dree (Monsignor) on Sep 05, 2003 at 12:12 UTC
To make more efficient you can also try Text::LevenshteinXS that is 1000% faster than plain Text::Levenshtein.

Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: snippet [id://289152]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2021-12-01 15:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
R or B?

Results (13 votes). Check out past polls.

Notices?