Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Some of the above suggestions, benchmarked (Re: Similarity of strings)

by cmilfo (Hermit)
on May 15, 2002 at 19:08 UTC ( [id://166821]=note: print w/replies, xml ) Need Help??


in reply to Similarity of strings

I tried to gather some of the ideas above and merge them into one benchmarked thing. Also, I benchmarked replacing the split // and chop $var while $var with unpack. Anyway, hope you enjoy.

#!/usr/bin/perl use Benchmark qw(timethese cmpthese); my $string1 = 'This is a test of the emergency system!'; my $string2 = 'This is a test of the emergency broadcast system!'; my $template1 = "a1" x (length $string1); my $template2 = "a1" x (length $string2); print &xor_o, "\n", &split_o, "\n", &unpack_o, "\n", &chop_o, "\n", &chop_o2, "\n"; my $results = timethese( 100000, { XOR => \&xor_o, SPLIT => \&split_o, UNPACK => \&unpack_o, CHOP => \&chop_o, CHOP2 => \&chop_o2 } ); cmpthese($results); sub xor_o { my $xor = $string1^$string2; return ($xor =~ tr/\0//)/length($string1); } sub split_o { my @string1 = split //, $string1; my @string2 = split //, $string2; my $score = 0; my $length = scalar @string1; for (my $i = 0; $i < $length; $i++) { $score += ($string1[$i] eq $string2[$i]); } return $score/$length; } sub unpack_o { my @string1 = (unpack $template1, $string1); my @string2 = (unpack $template2, $string2); my $score = 0; my $length = scalar @string1; for (my $i = 0; $i < $length; $i++) { $score += ($string1[$i] eq $string2[$i]); } return $score/$length; } sub chop_o { my @string1 = (); my @string2 = (); my $score = 0; my $rstring1 = scalar reverse $string1; my $rstring2 = scalar reverse $string2; push @string1, (chop $rstring1) while $rstring1; push @string2, (chop $rstring2) while $rstring2; my $length = scalar @string1; for (my $i = 0; $i < $length; $i++) { $score += ($string1[$i] eq $string2[$i]); } return $score/$length; } sub chop_o2 { my $str1 = $string1; my $str2 = $string2; my $length = length $string1; my $score; $score += (chop $str1 eq chop $str2) while $str1; return $score/$length; } Benchmark: timing 100000 iterations of CHOP, CHOP2, SPLIT, UNPACK, XOR +... CHOP: 43 wallclock secs (37.29 usr + 0.05 sys = 37.34 CPU) @ 26 +78.09/s (n=100000) CHOP2: 9 wallclock secs ( 7.34 usr + 0.03 sys = 7.37 CPU) @ 13 +568.52/s (n=100000) SPLIT: 47 wallclock secs (38.76 usr + 0.07 sys = 38.83 CPU) @ 25 +75.33/s (n=100000) UNPACK: 34 wallclock secs (29.51 usr + 0.03 sys = 29.54 CPU) @ 33 +85.24/s (n=100000) XOR: 2 wallclock secs ( 0.92 usr + 0.00 sys = 0.92 CPU) @ 10 +8695.65/s (n=100000) Rate SPLIT CHOP UNPACK CHOP2 XOR SPLIT 2575/s -- -4% -24% -81% -98% CHOP 2678/s 4% -- -21% -80% -98% UNPACK 3385/s 31% 26% -- -75% -97% CHOP2 13569/s 427% 407% 301% -- -88% XOR 108696/s 4121% 3959% 3111% 701% --
Update: My apologies for the length @array lines above. That's what I get for yanking and putting. :) Thanks to those who caught it. New benchmarks are now shown.

Update2: I've also added the chop2 implemented in the comment below.

Replies are listed 'Best First'.
Re: Some of the above suggestions, benchmarked (Re: Similarity of strings)
by jmcnamara (Monsignor) on May 15, 2002 at 23:38 UTC

    I'm glad that someone benchmarked this.

    However, you were a little bit unfair to the chop method. :-) The scalar reverse and array assignments aren't necessary. The following is 5 times faster (although still 5 times slower than the xor method):

    sub chop2 { my $str1 = $string1; my $str2 = $string2; my $length = length $string1; my $score; $score += (chop $str1 eq chop $str2) while $str1; return $score/$length; }

    Update: Albannach points out that because the strings in this test are not of equal length, the reverse is required. My code was based on the original sample data.

    Also, it is worth adding that the speed of the xor method is less dependent on the string length than the other methods.

    --
    John.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://166821]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (5)
As of 2024-04-19 07:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found