off-by-one string comparison

argv has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: off-by-one string comparison by Zaxo (Archbishop) on Jun 05, 2005 at 17:42 UTC
Text::Levenshtein implements a well-known metric for difference between strings. `$ perl -MText::Levenshtein=distance -e'print distance( "foo bar baz"," +fooo bar baz"),$/' 1 $` [download] So your off-by-one test could be written, `use Text::Levenshtein qw/distance/; sub off_by_one { 1 == distance("foo bar baz", shift); }` [download] For extra credit, leave out the "1 ==" part ;-) After Compline, Zaxo	[reply] [d/l] [select]
Re: off-by-one string comparison by bobf (Monsignor) on Jun 05, 2005 at 17:46 UTC
Text::Levenshtein (which also has an XS version) will give the "edit distance" between two strings. There are several other modules that you could consider, including Text::PhraseDistance (which takes into consideration word order within the string) and String::Approx. Search CPAN - I'm sure there are others. HTH	[reply]
Re: off-by-one string comparison by crashtest (Curate) on Jun 05, 2005 at 19:15 UTC
This is definetely a problem to go to CPAN for, but in the interest of wheel-reinvention, here is an implementation of my interpretation of your question. The function below matches "sticky-finger" typos, i.e. it matches strings where any number of characters were duplicated (up to a specifiable number of times). That seemed to be the gist of your question... maybe you could provide strings that shouldn't match, too? use strict; use warnings; my $input = 'foo bar baz'; my @test_strings = ( 'fooo bar baz', 'foo bar baaz', 'foo bbar baz', 'ffoo baar baz', # Matches: Should it? 'fxo bar baz', # No match: Different character, not dup. 'foo baaar baz' # No match: Too many 'a's! ); my $fuzziness = 1; # Let's try out the function for (@test_strings){ if (fuzzy_match($input, $_, $fuzziness)){ print "MATCHED: $_\n"; } else{ print "NO MATCH: $_\n"; } } sub fuzzy_match{ my ($input, $test_string, $fuzziness) = @_; # Build a regex from the input string my $regex = ''; for (split //, $input){ $regex .= quotemeta($_) . "\{1," . ($fuzziness + 1) . "}?"; } $test_string =~ m/$regex/; } [download] As for the number of characters different, I think that would be `length($test_string) - length($input_string)`.	[reply] [d/l] [select]
Re: off-by-one string comparison by TedPride (Priest) on Jun 05, 2005 at 22:12 UTC
use strict; use warnings; my $str = 'foo bar baz'; while (<DATA>) { chomp; print "$_\n" if ob1($str, $_); } sub ob1 { my ($w1, $w2) = @_; return 0 if (abs(length($w1) - length($w2)) > 1); return 1 if $w1 eq $w2; if (length($w1) == length($w2)) { my ($i, $c); for ($i = 0; $i < length($w1); $i++) { $c++ if substr($w1, $i, 1) ne substr($w2, $i, 1); return 0 if $c && $c > 1; } return 1; } if (length($w1) > length($w2)) { my $t = $w1; $w1 = $w2; $w2 = $t; } $w1 = join '.?', '', split(//, $w1), ''; return 1 if $w2 =~ /$w1/; return 0; } __DATA__ fooo bar baz foo bar baaz foo bbar baz foo bar baz foo bar bz fob bar biz fooo bar bazz [download] This could also be written without regex, since you only really care about one character difference. Just match the characters both ways if there is one character difference in length, and if the matched characters total is equal to the length of the smaller string, then there's only one character difference. `adbc abc a eq a -> 1 d ne b -> c eq c <- 2 b eq b <- 3 d ne a <-` [download] Total matches = length of abc.	[reply] [d/l] [select]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks