This is a very neat trick. I turned it into code for my application, and then benchmarked it against my original suggestion, as well as following the suggestion of bit vectors via Bit::Vector::Overload.
This unpack trick is significantly faster than my previous transliteration based counting. I couldn't figure out the correct bitwise construct to count my '00', but I can derive that from the total expected. The Bit::Vector approach is particularly slow, as I didn't see a way to preload my binary version of the data into the vectors. I'm not sure where the overhead is coming from, but I also doubt that I should get anything better than the unpack approach.
Rate bitvec string unpack
bitvec 102/s -- -97% -99%
string 3276/s 3126% -- -63%
unpack 8801/s 8565% 169% --
Code used in testing:
use Bit::Vector::Overload;
use Benchmark qw(cmpthese);
use String::Random;
#Generate random strings of 01
my $foo = new String::Random;
my $length = 600;
my @strings;
my @bstrings;
for ( my $i = 0 ; $i < 10 ; $i++ ) {
my $string = $foo->randregex("[01]{$length}");
push @strings, $string;
push @bstrings, pack qq{b$length}, $string;
}
cmpthese(
-3,
{
'string' => sub {
for ( my $i = 0 ; $i < @strings ; $i++ ) {
for ( my $j = $i + 1 ; $j < @strings ; $j++ ) {
my $string1 = $strings[$i];
my $string2 = $strings[$j];
my ( $c01, $c10, $c11 ) = (
# ( $string1 | $string2 ) =~ tr[0][0], # count 00: COUNT by
+ math below
( ~$string1 & $string2 ) =~ tr[\1][\1], # c
+ount 01
( $string1 & ~$string2 ) =~ tr[\1][\1], # c
+ount 10
( $string1 & $string2 ) =~ tr[1][1], # c
+ount 11
);
my $c00 = $length - $c01 - $c10 - $c11;
}
}
},
'bitvec' => sub {
for ( my $i = 0 ; $i < @strings ; $i++ ) {
for ( my $j = $i + 1 ; $j < @strings ; $j++ ) {
my $string1 = $strings[$i];
my $string2 = $strings[$j];
my $vec1 = Bit::Vector->new_Bin( $length, $string1
+ );
my $vec2 = Bit::Vector->new_Bin( $length, $string2
+ );
my ( $v01, $v10, $v11 ) = (
#abs( ~$vec1 & ~$vec2 ), # count 00: COUNT by
+math below
abs( ~$vec1 & $vec2 ), # count 01
abs( $vec1 & ~$vec2 ), # count 10
abs( $vec1 & $vec2 ), # count 11
);
my $v0 = $length - $v01 - $v10 - $v11;
}
}
},
'unpack' => sub {
for ( my $i = 0 ; $i < @bstrings ; $i++ ) {
for ( my $j = $i + 1 ; $j < @bstrings ; $j++ ) {
my $bstring1 = $bstrings[$i];
my $bstring2 = $bstrings[$j];
my $p01 = unpack q/%32b*/, ~$bstring1 & $bstr
+ing2;
my $p10 = unpack q/%32b*/, $bstring1 & ~$bstr
+ing2;
my $p11 = unpack q/%32b*/, $bstring1 & $bstri
+ng2;
my $p00 = $length - $p01 - $p10 - $p11;
}
}
}
}
);
|