Maybe I just don't understand the question, but assuming unpack '%32b*', ... is sufficiently efficient, wouldn't a count on the 'b' template specifier serve to count the set bits from the beginning of the string up to but not including a given bit offset?
>perl -wMstrict -le
"my $b = qq{\x01\x03\x07\xff\xfe\xfc};
;;
print ' 1111111111222222222233333333334444444444';
print '0123456789' x 5;
print unpack 'b*', $b;
print ''
;;
for my $offset (0, 1, 7 .. 10, 15 .. 20, 24 .. 28, 30 .. 32) {
my $c = unpack qq{%32b$offset}, $b;
my $vc = vec_sum($b, $offset);
die qq{pack sum $c != vec sum $vc} if $c != $vc;
printf qq{before bit at offset %2d: %2d bits set \n}, $offset, $c;
}
;;
sub vec_sum {
my ($v, $p) = @_;
;;
my $c = 0;
vec($v, $_, 1) and ++$c for 0 .. $p - 1;
return $c;
}
"
1111111111222222222233333333334444444444
01234567890123456789012345678901234567890123456789
100000001100000011100000111111110111111100111111
before bit at offset 0: 0 bits set
before bit at offset 1: 1 bits set
before bit at offset 7: 1 bits set
before bit at offset 8: 1 bits set
before bit at offset 9: 2 bits set
before bit at offset 10: 3 bits set
before bit at offset 15: 3 bits set
before bit at offset 16: 3 bits set
before bit at offset 17: 4 bits set
before bit at offset 18: 5 bits set
before bit at offset 19: 6 bits set
before bit at offset 20: 6 bits set
before bit at offset 24: 6 bits set
before bit at offset 25: 7 bits set
before bit at offset 26: 8 bits set
before bit at offset 27: 9 bits set
before bit at offset 28: 10 bits set
before bit at offset 30: 12 bits set
before bit at offset 31: 13 bits set
before bit at offset 32: 14 bits set
Update: Example code changed to include check against vec sum algorithm given in OP; also made things a bit prettier.