( Edit: Updated based on comments below, and cleaned up. I've broken out the tests from the benchmark, done the tests using Test::More, and factored out the list of test cases.
I've also added a few more tests from tassilo, although some of them failed the tests and are commendted out for now.
)
Another Edit: repellent wins!! unpack is the way to go...
_Another_ Another Edit: tye wins! Not building a list is even better... See tye's post. I have updated this post w/ the tye results.
_Another_^3 Edit: BrowserUK wins! Well, after the fix so it isn't so destructive of the testArray.
A variant of BrowserUK's approach that uses substr comes in second, but a long ways behind chop.
_Another_^4 Edit: BrowserUK wins again!
buk2 is by far the fastest, but fails a "0", "0" test, so buk2_len which is a bit slower is the real winner, still 10 times faster than the competition for some test sets.
For REALLY fast routines being benchmarked, copying args is clearly a very bad idea. However, other routines like tye0 to tye2 which were the fastest for a while don't benefit as much from removing the argument copies (see the _opt variants in the results).
_Another_^5 Edit: BrowserUK wins again (again)!
buk3 is by far the fastest and passes all the tests
In check if all certain chars are found in a sentence, tallulah asked for improvements on a routine trying to check whether a given sentence contains all of a set of specified characters.
Many monks suggested solutions, with way more than one way to do it.
Here are the results of some benchmarks on all of the suggested approaches on my Macbook Pro, using the provided 5.8.8 build.
I was quite surprised by the results, actually... I'd have bet good money that the slice approach was going to beat anything else, unless Tanktalus's List::MoreUtils approaches won. I promise, I was going to post this no wonder which implementation won :)
Where a solution was regex based, I also added a variant using "study", to see how that impacted performance.
If anyone wants to add a few test cases then run the tests, invoke the script with a -t argument.
Detailed results behind readmore, to fix Meditations page formatting:
Short
Rate tallulah_OriginalPost
tallulah_OriginalPost 105/s --
Tanktalus_AllRegex_Study 127/s 22%
Tanktalus_AllRegex 130/s 24%
varian_hash 146/s 40%
RMGir_slice 207/s 98%
moritz_BuildRegex_WithStudy 243/s 133%
moritz_BuildRegex 261/s 149%
ysth_loookahead 350/s 234%
Tanktalus_AllIndex 411/s 293%
unpack_allindex 444/s 324%
unpack_allrindex 448/s 328%
tassilo_listutils_r 581/s 455%
tye2 655/s 525%
tye1 661/s 531%
RMGir_index 667/s 537%
repellent_unpack 770/s 635%
repellent_unpack_opt 792/s 657%
tye2_opt 807/s 671%
tye1_opt 815/s 678%
tye0 829/s 692%
tye0_opt 830/s 693%
buk_substr 1152/s 1000%
buk 1829/s 1648%
buk2_len 4443/s 4144%
buk2 4610/s 4305%
buk3 5144/s 4814%
LongShort
Rate varian_hash
varian_hash 4.90/s --
RMGir_slice 5.88/s 20%
tallulah_OriginalPost 168/s 3319%
Tanktalus_AllRegex_Study 173/s 3422%
Tanktalus_AllRegex 209/s 4171%
moritz_BuildRegex_WithStudy 243/s 4867%
moritz_BuildRegex 396/s 7983%
ysth_loookahead 575/s 11640%
Tanktalus_AllIndex 731/s 14822%
unpack_allrindex 799/s 16201%
unpack_allindex 821/s 16650%
tassilo_listutils_r 914/s 18549%
RMGir_index 1017/s 20652%
tye2 1018/s 20671%
tye1 1046/s 21234%
repellent_unpack 1158/s 23518%
repellent_unpack_opt 1207/s 24515%
tye2_opt 1386/s 28180%
tye0_opt 1399/s 28442%
tye0 1414/s 28746%
tye1_opt 1440/s 29269%
buk_substr 1761/s 35815%
buk 2849/s 58012%
buk2_len 9489/s 193482%
buk2 9686/s 197486%
buk3 11005/s 224395%
ShortLong
Rate tallulah_OriginalPost
tallulah_OriginalPost 3.20/s --
Tanktalus_AllRegex 4.42/s 38%
Tanktalus_AllRegex_Study 4.42/s 38%
moritz_BuildRegex_WithStudy 5.36/s 67%
moritz_BuildRegex 5.41/s 69%
varian_hash 8.18/s 156%
ysth_loookahead 9.26/s 189%
RMGir_index 14.4/s 351%
tassilo_listutils_r 15.9/s 396%
RMGir_slice 15.9/s 396%
Tanktalus_AllIndex 16.0/s 401%
repellent_unpack_opt 16.8/s 425%
repellent_unpack 17.1/s 435%
unpack_allrindex 20.0/s 525%
unpack_allindex 20.2/s 531%
tye2_opt 25.5/s 696%
tye0_opt 25.7/s 703%
tye0 25.7/s 704%
tye1_opt 26.2/s 719%
tye2 26.9/s 739%
tye1 27.4/s 755%
buk_substr 64.0/s 1900%
buk 67.9/s 2021%
buk2_len 888/s 27649%
buk2 939/s 29247%
buk3 1109/s 34553%
LongLong
Rate tallulah_OriginalPost
tallulah_OriginalPost 3.20/s --
Tanktalus_AllRegex_Study 4.46/s 40%
Tanktalus_AllRegex 4.50/s 41%
varian_hash 4.59/s 43%
moritz_BuildRegex 5.41/s 69%
moritz_BuildRegex_WithStudy 5.41/s 69%
RMGir_slice 6.80/s 112%
ysth_loookahead 9.40/s 194%
RMGir_index 12.9/s 302%
repellent_unpack_opt 15.1/s 372%
repellent_unpack 15.6/s 387%
tassilo_listutils_r 16.0/s 399%
Tanktalus_AllIndex 16.0/s 401%
unpack_allindex 20.4/s 537%
unpack_allrindex 20.6/s 543%
tye1_opt 25.9/s 708%
tye0 26.0/s 711%
tye2_opt 26.0/s 711%
tye0_opt 26.2/s 719%
tye2 27.3/s 752%
tye1 27.3/s 752%
buk_substr 63.6/s 1889%
buk 67.3/s 2002%
buk2_len 852/s 26537%
buk2 896/s 27908%
buk3 1017/s 31690%
VeryLong
Rate varian_hash
varian_hash 8.41/s --
RMGir_slice 10.2/s 21%
tallulah_OriginalPost 189/s 2152%
Tanktalus_AllRegex_Study 205/s 2332%
Tanktalus_AllRegex 245/s 2810%
moritz_BuildRegex_WithStudy 346/s 4019%
moritz_BuildRegex 508/s 5942%
ysth_loookahead 800/s 9411%
Tanktalus_AllIndex 1090/s 12861%
unpack_allrindex 1184/s 13980%
tassilo_listutils_r 1207/s 14245%
unpack_allindex 1230/s 14521%
tye2_opt 1267/s 14963%
tye1_opt 1268/s 14974%
tye0 1279/s 15106%
tye2 1280/s 15118%
RMGir_index 1280/s 15118%
tye0_opt 1292/s 15264%
tye1 1297/s 15323%
repellent_unpack 1506/s 17801%
repellent_unpack_opt 1527/s 18058%
buk_substr 2634/s 31219%
buk 2830/s 33543%
buk2_len 24093/s 286343%
buk2 25048/s 297687%
buk3 28768/s 341917%
I'd be interested in seeing how these numbers change for 5.10, if anyone has an installation handy. The benchmark code is behind the second readmore tag.
#!/usr/bin/perl -w
use strict;
use List::MoreUtils qw(all);
use Benchmark qw(cmpthese);
#use Test::More qw(no_plan);
require Test::More;
my @shortTestCases=(
# sentence wantedChars result
[ "abxcd zwe rrv", "0", 0 ],
[ "0", "0", 1 ],
[ "abxcd zwe rrv", "xxv", 1 ],
[ "abxcd zwe rrv", "xxvq", 0 ],
[ "abxcd zwe rrv", "", 1 ],
[ "The quick brown fox jumps over the lazy dog",
"abcdefghijklmnopqrstuvwxyz",
1 ],
[ "The quick brown fox jumps over the lazy dog",
"abcdefghijklmnopqrstuvwxyzT",
1 ],
[ "The quick brown fox jumps over the lazy dog",
"abcdefghijklmnopqrstuvwxyzTU",
0 ],
[ "The quick brown fox jumps over the lazy dog",
"a", 1 ],
[ "The quick brown fox jumps over the lazy dog",
"", 1 ],
);
# Long sentence, short wantedChars
my @longShortTestCases = (
[ "The quick brown fox jumps over the lazy dog" x 100,
"", 1 ],
[ "The quick brown fox jumps over the lazy dog" x 100,
"a", 1 ],
[ "The quick brown fox jumps over the lazy dog" x 100,
"abcdefghijklmnopqrstuvwxyzT",
1 ],
[ "The quick brown fox jumps over the lazy dog" x 100,
"abcdefghijklmnopqrstuvwxyzTU",
0 ],
);
# Short sentence, long wantedChars
my @shortLongTestCases = (
[ "The quick brown fox jumps over the lazy dog",
"abcdefghijklmnopqrstuvwxyzT"x100,
1 ],
[ "The quick brown fox jumps over the lazy dog",
"abcdefghijklmnopqrstuvwxyzTU"x100,
0 ],
);
# Long sentence, long wantedChars
my @longLongTestCases = (
[ "The quick brown fox jumps over the lazy dog" x 100,
"abcdefghijklmnopqrstuvwxyzT"x100,
1 ],
[ "The quick brown fox jumps over the lazy dog" x 100,
"abcdefghijklmnopqrstuvwxyzTU"x100,
0 ],
);
# VERY long test case
my $alphabet=(join '' => 'a' .. 'z', 'A' .. 'Z');
my @veryLongTestCase = (
[
random(10000, $alphabet),
$alphabet, 1 ],
);
my @testCases = (
@shortTestCases,
@shortLongTestCases,
@longShortTestCases,
@longLongTestCases,
@veryLongTestCase
);
sub random {
my $num = shift;
my $wantedChars = shift;
my @chars = ('a' .. 'z', 'A' .. 'Z');
my $result;
# make sure result will work
do {
$result = join '' => map $chars[rand @chars], 1 .. $num;
} until Tanktalus_AllIndex($result, $wantedChars);
return $result;
}
sub test_routine
{
my ($testFn, $testName)=@_;
foreach(@testCases) {
my ($sentence, $wantedLetters, $expectedResult) = @$_;
ok (!!($testFn->($sentence, $wantedLetters)) == !!$expectedResu
+lt);
}
}
sub benchmark_routine
{
my ($testFn, $testName, $testCases)=@_;
foreach(@$testCases) {
my ($sentence, $wantedLetters, $expectedResult) = @$_;
$testFn->($sentence, $wantedLetters) for 1..20;
}
}
# [id://707122]
sub tallulah_OriginalPost
{
my ($sentence, $wantedLetters)=@_;
my $flag=0;
my @a = split '',$wantedLetters;
for( my $i=0; $i<$#a+1; $i++ ) {
if($sentence !~ /$a[$i]/) {
$flag=1;last;
}
}
return !$flag;
}
# [id://707123]
sub moritz_BuildRegex
{
my ($sentence, $wantedLetters)=@_;
my $re = '^' . join '', map "(?=.*?$_)", map quotemeta, split m//,
$wantedLetters;
if ($sentence =~ m/$re/) {
return 1;
}
return 0;
}
# [id://707123]
sub moritz_BuildRegex_WithStudy
{
my ($sentence, $wantedLetters)=@_;
my $re = '^' . join '', map "(?=.*?$_)", map quotemeta, split m//,
$wantedLetters;
study $sentence;
if ($sentence =~ m/$re/) {
return 1;
}
return 0;
}
# [id://707124]
sub RMGir_index {
my ($sentence, $wantedLetters)=@_;
# don't need this variable (or any of them, in
# fact -- they're just here for clarity.
# we could work straight out of @_ if we wanted
# this terser
# Also, the $[ check is just pedantic - if someone
# changes $[, shoot them.
my $foundLetters=scalar (grep index($sentence,$_)>=$[,
split //,$wantedLetters);
return length($wantedLetters)==$foundLetters;
}
# [id://707222]
sub Tanktalus_AllRegex {
my ($sentence, $letters) = @_;
return 1 unless length($letters);
# all we're doing is checking for each letter.
all { $sentence =~ $_ } split //, $letters;
}
# [id://707222]
sub Tanktalus_AllRegex_Study {
my ($sentence, $letters) = @_;
return 1 unless length($letters);
study $sentence;
# all we're doing is checking for each letter.
all { $sentence =~ $_ } split //, $letters;
# same as above, but with index which I think is less readable.
#all { index($sentence, $_) >= $[ } split //, $letters;
}
# [id://707222]
sub Tanktalus_AllIndex {
my ($sentence, $letters) = @_;
return 1 unless length($letters);
# same as above, but with index which I think is less readable.
all { index($sentence, $_) >= $[ } split //, $letters;
}
# JavaFan's looks about equivalent to OP approach
# Doesn't have same repeated letter semantics specified in
# OP post.
# [id://707176]
sub oshalla_scan {
my ($sentence, $wanted) = @_ ;
while (length($wanted)) {
return 0 if ($sentence !~ m/([$wanted])/g) ;
$wanted =~ s/$1// ;
} ;
return 1;
}
# [id://707231]
sub varian_hash {
my ($sentence, $wantedLetters)=@_;
my %required = map {$_ => 1} split //,$wantedLetters;
map delete $required{$_}, split //, $sentence;
if (keys %required) {
return 0;
}
else {
return 1;
}
}
# [id://707314]
sub RMGir_slice {
my ($sentence, $wantedLetters)=@_;
my %required;
@required{split //,$wantedLetters}=();
delete @required{split //, $sentence};
if (keys %required) {
return 0;
}
else {
return 1;
}
}
# [tassilo]'s test cases
sub makeSubRef {
return @_[1,0];
}
my %testSubroutines=(
makeSubRef(\&tallulah_OriginalPost,
"tallulah_OriginalPost"),
makeSubRef(\&moritz_BuildRegex,
"moritz_BuildRegex"),
makeSubRef(\&moritz_BuildRegex_WithStudy,
"moritz_BuildRegex_WithStudy"),
makeSubRef(\&RMGir_index,
"RMGir_index"),
makeSubRef(\&Tanktalus_AllRegex,
"Tanktalus_AllRegex"),
makeSubRef(\&Tanktalus_AllRegex_Study,
"Tanktalus_AllRegex_Study"),
makeSubRef(\&Tanktalus_AllIndex,
"Tanktalus_AllIndex"),
makeSubRef(\&varian_hash,
"varian_hash"),
makeSubRef(\&RMGir_slice,
"RMGir_slice"),
tassilo_listutils_r => sub {
return 1 unless length($_[1]);
all { rindex($_[0], $_) >= 0 } split //, $_[1];
},
repellent_unpack => sub {
my ($sentence, $wantedLetters)=@_;
my $foundLetters=scalar (grep index($sentence,$_)>=$[,
unpack "(a)*", $wantedLetters);
return length($wantedLetters)==$foundLetters;
},
repellent_unpack_opt => sub {
length($_[1])==scalar (grep index($_[0],$_)>=$[,
unpack "(a)*", $_[1]);
},
unpack_allindex => sub {
return 1 unless length($_[1]);
my ($sentence, $wantedLetters)=@_;
all { index($sentence, $_) >= $[ }
unpack "(a)*", $wantedLetters;
},
unpack_allrindex => sub {
return 1 unless length($_[1]);
my ($sentence, $wantedLetters)=@_;
all { rindex($sentence, $_) >= $[ }
unpack "(a)*", $wantedLetters;
},
tye2_opt => sub {
while( $_[1] =~ /(.)/gs ) {
return 0 if -1 == index($_[0],$1);
}
return 1;
},
tye1_opt => sub {
-1 == index($_[0],$1) && return 0
while( $_[1] =~ /(.)/gs );
return 1;
},
tye0_opt => sub {
-1 == index( $_[0], $1 ) && return 0
while( $_[1] =~ /(.)/gs );
return 1;
},
tye2 => sub {
my( $sentence, $wantedLetters )= @_;
while( $wantedLetters =~ /(.)/gs ) {
return 0 if -1 == index($sentence,$1);
}
return 1;
},
tye1 => sub {
my( $sentence, $wantedLetters )= @_;
-1 == index($sentence,$1) && return 0
while( $wantedLetters =~ /(.)/gs );
return 1;
},
tye0 => sub {
-1 == index( $_[0], $1 ) && return 0
while( $_[1] =~ /(.)/gs );
return 1;
},
# FAILS the "0" "0" test....
buk => sub {
my( $s, $w ) = @_;
my $c;
1+index $s, $c or return 0 while $c = chop $w;
1;
},
buk_substr => sub {
my( $s, $w ) = @_;
1+index $s, substr($w,$_,1) or return 0 foreach 0..length($w);
1;
},
# FAILS the "0" "0" test....
buk2 => sub {
local $_;
1+index $_[0], $_ or return while $_ = chop $_[1];
1;
},
buk2_len => sub {
local $_;
1+index $_[0], $_ or return while length($_ = chop $_[1]);
1;
},
buk3 => sub {
1+index $_[0], chop $_[ 1 ] or return for 1 .. length $_[ 1 ];
1;
},
buk4 => sub {
( -1 != index $_[0], chop $_[ 1 ] ) || return for 1 .. length
+$_[ 1 ];
1;
},
ysth_loookahead => sub {
my ($sentence, $wantedLetters) = @_;
$wantedLetters =~ s/(.)/(?=.*?\Q$1\E)/sg;
$sentence =~ /^$wantedLetters/s;
}
# # These 2 routines need more debugging - they fail the tests
# tassilo_xor => sub {
# return 1 unless length($_[1]);
# my $copy = $_[0];
# for (split //, $_[1]) {
# my $mask = ($_) x length $copy;
# $copy ^= $mask;
# $copy =~ tr/\000//d;
# $copy ^= ( ($_) x length $copy );
# };
# length($copy) == 0;
# },
# tassilo_tr => sub {
# return 1 unless length($_[1]);
# my $copy = $_[0];
# eval "\$copy =~ tr/$_[1]//d";
# length($copy) == 0;
# }
);
if(@ARGV && $ARGV[0] eq "-t") {
use Test::More;
plan tests => ((scalar keys %testSubroutines) * scalar @testCases)
+;
print "Testing routines...\n";
foreach my $name(sort keys %testSubroutines) {
print "Testing $name\n";
test_routine($testSubroutines{$name}, $name);
}
exit(0);
}
print "Running benchmarks...\n";
my $testsRef = \@shortTestCases;
my $benchmark_routines={
map { ($_,
eval qq[sub { benchmark_routine(\$testSubroutines{"$_"},
"$_", \$testsRef); }
])
} keys %testSubroutines
};
print "Short \n";
cmpthese(-1, $benchmark_routines);
$testsRef = \@longShortTestCases;
print "\n\n";
print "LongShort \n";
cmpthese(-1, $benchmark_routines);
$testsRef = \@shortLongTestCases;
print "\n\n";
print "ShortLong \n";
cmpthese(-1, $benchmark_routines);
$testsRef = \@longLongTestCases;
print "\n\n";
print "LongLong \n";
cmpthese(-1, $benchmark_routines);
$testsRef = \@veryLongTestCase;
print "\n\n";
print "VeryLong \n";
cmpthese(-1, $benchmark_routines);
Re: Benchmarking "Are all these characters in this sentence?"
by moritz (Cardinal) on Aug 28, 2008 at 23:05 UTC
|
I see two interesting variables that you could change - the length of the string (and not just up to about 20, but perhaps up to 20k characters), and the number of strings that each set of characters is tested on.
Currently the comparison is just a random point of data in the big, two-dimensional benchmark space ;-) | [reply] |
|
Short sentence and search set cases
tallulah_OriginalPost 2054/s
Tanktalus_AllRegex 2511/s
Tanktalus_AllRegex_Study 2522/s
moritz_BuildRegex_WithStudy 2595/s
moritz_BuildRegex 2715/s
varian_hash 2983/s
RMGir_slice 4035/s
Tanktalus_AllIndex 8219/s
RMGir_index 12107/s
Long sentence and Short search set cases
varian_hash 97.2/s
RMGir_slice 115/s
moritz_BuildRegex_WithStudy 3172/s
tallulah_OriginalPost 3230/s
Tanktalus_AllRegex_Study 3319/s
Tanktalus_AllRegex 4054/s
moritz_BuildRegex 4250/s
Tanktalus_AllIndex 13032/s
RMGir_index 17612/s
Short sentence and Long search set cases
moritz_BuildRegex_WithStudy 54.1/s
moritz_BuildRegex 54.6/s
tallulah_OriginalPost 63.6/s
Tanktalus_AllRegex 86.9/s
Tanktalus_AllRegex_Study 87.1/s
varian_hash 161/s
RMGir_index 285/s
RMGir_slice 319/s
Tanktalus_AllIndex 320/s
Long sentence and Long search set cases
moritz_BuildRegex_WithStudy 54.1/s
moritz_BuildRegex 54.6/s
tallulah_OriginalPost 63.6/s
varian_hash 86.9/s
Tanktalus_AllRegex_Study 87.7/s
Tanktalus_AllRegex 87.7/s
RMGir_slice 135/s
RMGir_index 250/s
Tanktalus_AllIndex 319/s
Here's the benchmark code with the added data points:
| [reply] [d/l] |
Re: Benchmarking "Are all these characters in this sentence?"
by Jenda (Abbot) on Aug 29, 2008 at 08:57 UTC
|
I tried a little modified version of the RMGir_index one and the results are ... random. Especially since you do a lot more work in the benchmark than just the test you want to compare, the results depend on too many things and the actual different code seems to play only small part of it. I think you should at least change the benchmark_routine() to something like
sub benchmark_routine
{
my ($testFn, $testName)=@_;
foreach(@testCases) {
my ($sentence, $wantedLetters, $expectedResult) = @$_;
$testFn->($sentence, $wantedLetters) for (1..20);
die "$testName test failed ($sentence, $wantedLetters)"
unless (($testFn->($sentence, $wantedLetters))==$expectedRes
+ult);
}
}
to give a little more weight to the tested subroutine. Not that it helped too much in this case.
sub RMGir_indexNot {
my ($sentence, $wantedLetters)=@_;
# don't need this variable (or any of them, in
# fact -- they're just here for clarity.
# we could work straight out of @_ if we wanted
# this terser
# Also, the $[ check is just pedantic - if someone
# changes $[, shoot them.
return !(grep index($sentence,$_)<$[,
split //,$wantedLetters);
}
| [reply] [d/l] [select] |
|
Good point. My assumption when I wrote the code was that the testFn routines were going to be fairly expensive, so benchmark_routine wasn't unacceptable overhead.
I should have written separate test and benchmark routines, rather than let the benchmark run and check my test cases.
Re-running the tests with the 1..20 fix you suggested does alter the percentages between tests, but doesn't change the relative rankings much.
Short
tallulah_OriginalPost 107/s
Tanktalus_AllRegex_Study 124/s
Tanktalus_AllRegex 130/s
varian_hash 152/s
RMGir_slice 207/s
moritz_BuildRegex_WithStudy 243/s
moritz_BuildRegex 261/s
Tanktalus_AllIndex 452/s
RMGir_index 678/s
LongShort
varian_hash 4.76/s
RMGir_slice 5.77/s
tallulah_OriginalPost 166/s
Tanktalus_AllRegex_Study 169/s
Tanktalus_AllRegex 207/s
moritz_BuildRegex_WithStudy 241/s
moritz_BuildRegex 395/s
Tanktalus_AllIndex 731/s
RMGir_index 1000/s
ShortLong
tallulah_OriginalPost 3.15/s
Tanktalus_AllRegex_Study 4.27/s
Tanktalus_AllRegex 4.31/s
moritz_BuildRegex_WithStudy 5.22/s
moritz_BuildRegex 5.26/s
varian_hash 7.84/s
RMGir_index 13.9/s
RMGir_slice 15.3/s
Tanktalus_AllIndex 15.8/s
LongLong
tallulah_OriginalPost 3.17/s
Tanktalus_AllRegex_Study 4.24/s
varian_hash 4.35/s
Tanktalus_AllRegex 4.35/s
moritz_BuildRegex 5.17/s
moritz_BuildRegex_WithStudy 5.22/s
RMGir_slice 6.60/s
RMGir_index 12.5/s
Tanktalus_AllIndex 15.5/s
| [reply] |
Re: Benchmarking "Are all these characters in this sentence?" (quickest yet?)
by BrowserUk (Patriarch) on Aug 30, 2008 at 17:32 UTC
|
buk => sub {
my( $s, $w ) = @_;
1+index $s, $_ or return while $_ = chop $w;
1;
},
In a cursory test it seems to be about twice as fast as the best of tye's in all cases.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
This IS fast, but if I run this in my test harness, for some reason it destroys the @testArray contents.
Aha, found it. $_ isn't localized, so using it that way trashes the array. @#$%@#$.
buk => sub {
my( $s, $w ) = @_;
my $c;
1+index $s, $c or return 0 while $c = chop $w;
1;
},
works just fine.
| [reply] |
|
buk => sub {
local $_;
my( $s, $w ) = @_;
1+index $s, $_ or return while $_ = chop $w;
1;
},
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
|
|
Re: Benchmarking "Are all these characters in this sentence?"
by rhesa (Vicar) on Aug 30, 2008 at 19:01 UTC
|
Here are the results using perl, v5.10.0 built for x86_64-linux. Interesting changes in ranking.
I'm surprised tye's /./gs solution took such a hit. The unpack and index solutions did get a bit faster, but not much. Weren't regexes supposed to have gotten a bit faster in 5.10?
Short
Rate tallulah_OriginalPost
tallulah_OriginalPost 114/s --
Tanktalus_AllRegex 160/s 40%
Tanktalus_AllRegex_Study 161/s 41%
varian_hash 190/s 66%
moritz_BuildRegex_WithStudy 293/s 157%
moritz_BuildRegex 313/s 174%
RMGir_slice 345/s 203%
tye2 426/s 273%
tye1 426/s 273%
tye0 570/s 399%
unpack_allindex 598/s 424%
unpack_allrindex 599/s 425%
repellent_unpack 710/s 522%
Tanktalus_AllIndex 710/s 522%
RMGir_index 881/s 671%
tassilo_listutils_r 1037/s 808%
buk 1280/s 1021%
| [reply] [d/l] [select] |
|
| [reply] |
Re: Benchmarking "Are all these characters in this sentence?"
by repellent (Priest) on Aug 29, 2008 at 22:18 UTC
|
unpack()ing the letters, instead of split()ting them, may boost the performance slightly:
sub RMGir_index {
my ($sentence, $wantedLetters)=@_;
## comment removed ##
my $foundLetters=scalar (grep index($sentence,$_)>=$[,
unpack "(a)*", $wantedLetters);
return length($wantedLetters)==$foundLetters;
}
| [reply] [d/l] [select] |
|
sub tye2 {
my( $sentence, $wantedLetters )= @_;
while( $wantedLetters =~ /(.)/gs ) {
return 0 if -1 == index($sentence,$1);
}
return 1;
}
If you want to go for ugly code for the sake of micro-optimizations, then
sub tye1 {
my( $sentence, $wantedLetters )= @_;
-1 == index($sentence,$1) && return 0
while( $wantedLetters =~ /(.)/gs );
return 1;
}
Or get even uglier to the point of risking improper behavior in some cases:
sub tye0 {
-1 == index( $_[0], $1 ) && return 0
while( $_[1] =~ /(.)/gs );
return 1;
}
| [reply] [d/l] [select] |
|
Very nice!
Except for the "VeryLong" test case, one of those wins all the other benchmarks. For the short charset cases, tye0 wins, while for the others, tye1 and tye2 are tied :) and slightly better than tye0.
And on the VeryLong case, repellent's unpack win, but the 2nd place results look like a wash between several of the approaches, and yours are quite competitive.
I've updated the parent node with your subs, thanks!
| [reply] |
|
repellent++ !!
I never thought of that.
Happily, JUST before I saw that you posted that, I posted a new version of the code that makes adding variants easy, so I added your variant, as well as variants using unpack + List::MoreUtils, including a new variant from tassilo.
Your unpack approach wins!! Thanks for pointing that option out...
The parent post is updated with the new code and the new results.
| [reply] |
Re: Benchmarking "Are all these characters in this sentence?"
by ysth (Canon) on Aug 31, 2008 at 17:22 UTC
|
Re: chop, the "buk" solution fails on this test:
[ "abxcd zwe rrv", "0", 0 ],
because 0 is false.
On a more conceptual level, it fails in that the return value of chop on
an empty string is not documented. But in practice, this would be the fix:
1+index $s, $c or return 0 while length($c = chop $w);
| [reply] [d/l] [select] |
|
[ "0", "0", 1],
to test for those pesky perl falsehood values.
buk and buk2 both pass this new test and fail yours. (Edit: Corrected, thanks ysth)
Adding length makes buk2 correct, and doesn't penalize the speed much:
buk2_len 4610/s 4007%
buk2 4791/s 4168%
| [reply] [d/l] |
|
buk3 => sub {
1+index $_[0], chop $_[ 1 ] or return for 1 .. length $_[ 1 ];
1;
},
Who says optimisation can't also result in correct code. Such fun :)
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
|
| [reply] |
|
|
|
Re: Benchmarking "Are all these characters in this sentence?"
by ysth (Canon) on Aug 31, 2008 at 17:23 UTC
|
While nowhere close to a winner, this beats all the other regex solutions:
sub lookahead
{
my ($sentence, $wantedLetters) = @_;
$wantedLetters =~ s/(.)/(?=.*?\Q$1\E)/sg;
$sentence =~ /^$wantedLetters/s;
}
| [reply] [d/l] |
Re: Benchmarking "Are all these characters in this sentence?"
by martin (Friar) on Sep 05, 2008 at 14:33 UTC
|
Just for completeness, try this one, too:
martin => sub {
# args: sentence, wantedchars
'' eq $_[1] ||
'' ne $_[0] &&
$_[1] =~ /^[\Q$_[0]\E]*\z/;
},
Putting the whole sentence into a character class will kind of suck with veeery long sentences but medium-sized cases should benefit from the one-pass approach.
Update: fixed the order of subroutine parameters | [reply] [d/l] |
|
| [reply] |
|
| [reply] |
|
|