I wanted to cut cycles on one of my programs that needs string approximation and the Text::LevenshteinXS module wouldn't install (no 5.8 on the server) so I wrote my own copy of the levenshtein algorithm. benchmarking against Text::Levenshtein showed 15-35% speed gain. I was thinking of uploading this to cpan, since i've seen many something/Fastsomething modules up there. The biggest difference in the algorithm itself (not the short circuiting) is the min function...why people want to loop over an array when they KNOW there are only going to be 3 paramaters really confounds me.
Think this fast module is worthy of cpan?
Think this fast module is worthy of cpan?
package Text::FastLevenshtein; use strict; use Exporter; use vars qw ($VERSION @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS); $VERSION = '0.02'; @ISA = qw(Exporter); @EXPORT = (); @EXPORT_OK = qw(&distance); %EXPORT_TAGS = (); sub _min { my $min = $_[0]; $min = $_[1] if $_[1] < $min; $min = $_[2] if $_[2] < $min; return $min; } sub distance($$) { my $word1 = shift; my $word2 = shift; return 0 if $word1 eq $word2; my @d; my $len1 = length $word1; my $len2 = length $word2; $d[0][0] = 0; for (1 .. $len1) { $d[$_][0] = $_; return $_ if $_!=$len1 && substr($word1,$_) eq substr( +$word2,$_); } for (1 .. $len2) { $d[0][$_] = $_; return $_ if $_!=$len2 && substr($word1,$_) eq substr( +$word2,$_); } for my $i (1 .. $len1) { my $w1 = substr($word1,$i-1,1); for (1 .. $len2) { $d[$i][$_] = _min($d[$i-1][$_]+1, $d[$i][$_-1] ++1, $d[$i-1][$_-1]+($w1 eq substr($word2,$_-1,1) ? 0 : 1)); } } return $d[$len1][$len2]; } 1; __END__
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: improved levenshtein
by kvale (Monsignor) on Mar 03, 2004 at 17:59 UTC | |
by bageler (Hermit) on Mar 03, 2004 at 18:36 UTC | |
by valdez (Monsignor) on Mar 03, 2004 at 18:54 UTC | |
by waswas-fng (Curate) on Mar 03, 2004 at 22:24 UTC | |
by stonecolddevin (Parson) on Nov 18, 2013 at 23:52 UTC | |
Re: improved levenshtein
by QM (Parson) on Mar 04, 2004 at 00:01 UTC | |
Re: improved levenshtein
by diotalevi (Canon) on Mar 03, 2004 at 23:29 UTC | |
Re: improved levenshtein
by hossman (Prior) on Mar 03, 2004 at 21:28 UTC | |
Re: improved levenshtein
by tachyon (Chancellor) on Mar 04, 2004 at 14:45 UTC | |
Re: improved levenshtein
by Bill.Costa (Acolyte) on Oct 16, 2013 at 01:21 UTC | |
by boftx (Deacon) on Oct 16, 2013 at 01:46 UTC |
Back to
Cool Uses for Perl