Hello Perl Monks,
I'm working on a tsv, one of its columns is a csv list of keywords (28 unique values). I'd like to compute the Jaccard Index (Intersection / Union) of this list of keywords.
To do so efficiently I'd like to use a bit array to represent the list of keywords.
I tried to read few articles on Perlmonks and stackoverflow, but so far I feel I'm missing something completely obvious.
Here is what I wrote:
use common::sense;
my $a = '';
my $b = '';
$a += 1 << 0;
$a += 1 << 1;
$b += 1 << 1;
$b += 1 << 2;
my $i = $a & $b;
my $u = $a | $b;
my $i_cnt = unpack '%32b*', $i;
my $u_cnt = unpack '%32b*', $u;
printf "a is %#032b %d\n", $a, $a;
printf "b is %#032b %d\n", $b, $b;
printf "intersection is %#032b %d\n", $i, $i;
printf "union is %#032b %d\n", $u, $u;
say "set bit count in intersection: $i_cnt";
say "set bit count in union: $u_cnt";
Actual result:
a is 0b000000000000000000000000000011 3
b is 0b000000000000000000000000000110 6
intersection is 0b000000000000000000000000000010 2
union is 0b000000000000000000000000000111 7
set bit count in intersection: 3
set bit count in union: 5
Expected result:
a is 0b000000000000000000000000000011 3
b is 0b000000000000000000000000000110 6
intersection is 0b000000000000000000000000000010 2
union is 0b000000000000000000000000000111 7
set bit count in intersection: 1
set bit count in union: 3
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.