How to do popcount (aka Hamming weight) in Perl (popcount References)

eyepopslikeamosquito has asked for the wisdom of the Perl Monks concerning the following question:

In More Betterer Game of Life, I needed a popcount (aka Hamming weight) function to sum the one bits in a 64-bit value.

I started with the basic popcount1 below, scraped with little thought from the Hamming_weight wikipedia page.

I'd like to improve that, hence this node.

use strict;
use warnings;
use Benchmark qw(timethese);

sub popcount1 {
   my $x = shift;
   my $count;
   for ($count = 0; $x; ++$count) { $x &= $x - 1 }
   return $count;
}

sub popcount2 {
   return sprintf('%b', shift) =~ tr/1//;
}

# Update: unpack('%b*', pack('J', shift)) is better (see Rosetta ref. 
+below)
# This works but is slower: unpack('b*', pack('J', shift)) =~ tr/1//;
sub popcount3 {
   return unpack('%32b*', pack('Q', shift));
}

my $start = 2**32 - 42;
my $end   = $start + 1000000;

print "sanity test for correctness\n";
for my $i (0 .. 256, $start .. $end) {
   my $one = popcount1($i);
   my $two = popcount2($i);
   my $three = popcount3($i);
   # print "$i: $one $two $three\n";
   $one == $two or die;
   $one == $three or die;
}

timethese 50, {
   One   => sub { popcount1($_) for $start .. $end },
   Two   => sub { popcount2($_) for $start .. $end },
   Three => sub { popcount3($_) for $start .. $end },
};
[download]

Running the above program on my machine produced:

sanity test for correctness
Benchmark: timing 50 iterations of One, Three, Two...
   One: 29 wallclock secs (28.41 usr +  0.00 sys = 28.41 CPU) @  1.76/
+s (n=50)
 Three: 10 wallclock secs (10.00 usr +  0.00 sys = 10.00 CPU) @  5.00/
+s (n=50)
   Two: 10 wallclock secs (10.03 usr +  0.00 sys = 10.03 CPU) @  4.98/
+s (n=50)
[download]

Improvements welcome.

References

More Betterer Game of Life
Bitwise operators on binary objects
Hamming weight (wikipedia, aka popcount)
Rosetta Code (Perl)
chess programming - Population Count (was on chessprogramming.wikispaces.com until wikispaces closed)
Programming pages of Jasper Neumann
Hacker's Delight book by Henry S. Warren Jr
Math::GMPz
GMP: The GNU Multiple Precision Arithmetic Library
perlnumber
Integer Arithmetic (use integer)

marioroy's MCE Sandbox (Sandboxing with Perl MCE/Inline::C, C/OpenMP, and Codon) - popcount function in bits.h and Math::Prime::Util

Update: Added some C/C++ information from Hamming weight (wikipedia).

Some C compilers provide intrinsic functions that provide bit counting facilities. For example, GCC (since version 3.4 in April 2004) includes a builtin function __builtin_popcount that will use a processor instruction if available or an efficient library implementation otherwise. LLVM-GCC has included this function since version 1.5 in June 2005.

In C++ STL, the bit-array data structure bitset has a count() method that counts the number of bits that are set. In C++20, a new header <bit> was added, containing functions std::popcount and std::has_single_bit, taking arguments of unsigned integer types.

I've used this for code that builds with both GCC and MSVC:

// Make GCC __builtin_popcountll available with MSVC
// (used to calculate the Hamming weight efficiently)
#ifdef _MSC_VER
#include <intrin.h>
#define __builtin_popcountll __popcnt64
#define __builtin_popcount   __popcnt
#endif
[download]

Updated: Added more references (thanks oiskuu)

Comment on How to do popcount (aka Hamming weight) in Perl (popcount References) Select or Download Code

Replies are listed 'Best First'.

Re: How to do popcount (aka Hamming weight) in Perl
by BrowserUk (Patriarch) on Sep 24, 2017 at 14:51 UTC

This (labelled "six" ) is the fastest C/Perl method I found when I was looking a (long) while ago) that works on chips that don't have a popcnt instruction:

#! perl -slw
use strict; package SparseBitVector; use Config;
use Inline C => Config => BUILD_NOISY => 1;
use Inline C => <<'END_C',  NAME => 'Junk', CLEAN_AFTER_BUILD =>0;

typedef unsigned __int64 U64;
typedef          __int64 I64;

unsigned popcnt( SV *sv ) {
    U64 x = (U64) SvUVX( sv );
    x -=( x >> 1 ) & 0x5555555555555555ULL;
    x = ( x & 0x3333333333333333ULL ) + ( ( x >> 2 ) & 0x3333333333333
+333ULL );
    x = ( x + (x >> 4)) & 0x0f0f0f0f0f0f0f0fULL;
    return (unsigned)( ( x * 0x0101010101010101ull ) >> 56 );
}

END_C

use Benchmark qw( cmpthese );

sub popcount1 {
   my $x = shift;
   my $count;
   for ($count = 0; $x; ++$count) { $x &= $x - 1 }
   return $count;
}

sub popcount2 {
   return sprintf('%b', shift) =~ tr/1//;
}

sub popcount3 {
   return unpack('%32b*', pack('Q', shift));
}

sub popcount4 {
   return unpack( '%32b*', pack('Q', $_[0] ) );
}

my $start = 2**32 - 42;
my $end   = $start + 1000000;

print "sanity test for correctness\n";
for my $i (0 .. 256, $start .. $end) {
   my $two = popcount2($i);
   my $three = popcount3($i);
   my $four = popcount4( $i );
   my $six = popcnt( $i );
   $three == $two or print "$i: $two $three\n" and die;
   $four == $six and $three == $six or print "$i: 2:$two 3:$three 6:$s
+ix\n" and die;
}

cmpthese -3, {
#   One   => sub { popcount1($_) for $start .. $end },
   Two   => sub { popcount2($_) for $start .. $end },
   Three => sub { popcount3($_) for $start .. $end },
   Four  => sub { popcount4($_) for $start .. $end },
   Six   => sub { popcnt(   $_) for $start .. $end },
};
[download]

Outputs:

C:\test>1199987.pl
sanity test for correctness

        Rate   Two Three  Four   Six
Two   1.02/s    --   -4%   -5%  -58%
Three 1.06/s    4%    --   -1%  -57%
Four  1.08/s    5%    1%    --  -56%
Six   2.45/s  140%  131%  128%    --
[download]

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.

In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit

[reply]
[d/l]
[select]

Re^2: How to do popcount (aka Hamming weight) in Perl

by marioroy (Prior) on Sep 27, 2017 at 07:13 UTC

If needed, Inline C code supporting 64 and 32 bits.

use strict;
use warnings;

use Inline 'C' => Config => BUILD_NOISY => 1, CLEAN_AFTER_BUILD => 0;
use Inline 'C' => <<'END_C';

#include <stdint.h>

unsigned popcnt2( SV *sv ) {
    unsigned count = 0;

    #ifdef __LP64__
        static const uint64_t m1  = UINT64_C(0x5555555555555555);
        static const uint64_t m2  = UINT64_C(0x3333333333333333);
        static const uint64_t m4  = UINT64_C(0x0f0f0f0f0f0f0f0f);
        static const uint64_t h01 = UINT64_C(0x0101010101010101);

        uint64_t x = (uint64_t) SvUVX( sv );

        x =  x       - ((x >> 1)  & m1);
        x = (x & m2) + ((x >> 2)  & m2);
        x = (x       +  (x >> 4)) & m4;

        count += (unsigned) ((x * h01) >> 56);
    #else
        static const uint32_t m1  = UINT32_C(0x55555555);
        static const uint32_t m2  = UINT32_C(0x33333333);
        static const uint32_t m4  = UINT32_C(0x0f0f0f0f);
        static const uint32_t h01 = UINT32_C(0x01010101);

        uint32_t x = (uint32_t) SvUVX( sv );

        x =  x       - ((x >> 1)  & m1);
        x = (x & m2) + ((x >> 2)  & m2);
        x = (x       +  (x >> 4)) & m4;

        count += (unsigned) ((x * h01) >> 24);
    #endif

    return count;
}

END_C

use Benchmark qw( cmpthese );

my $start = 2 ** 32 - 42;
my $end   = $start + 1000000;

cmpthese -3, {
    Six2 => sub { popcnt2($_) for $start .. $end },
};
[download]

Regards, Mario

[reply]
[d/l]

Re: How to do popcount (aka Hamming weight) in Perl
by Athanasius (Archbishop) on Sep 24, 2017 at 12:54 UTC

Hello eyepopslikeamosquito,

The CPAN module Bit::Fast has a function popcountl, implemented in C, which operates on 64-bit integers and is claimed to be fast.

Disclaimer: I haven’t been able to test it because — although I have a 64-bit OS and a 64-bit Perl with USE_64_BIT_INT defined — my Perl was apparently built with LONGSIZE defined as 4, not 8, so (with use Config) $Config{longsize} == 4 and popcountl doesn’t get built. I confess I don’t understand the issues involved here — maybe just a configuration problem?

Anyway, hope that’s of interest,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re: How to do popcount (aka Hamming weight) in Perl
by marioroy (Prior) on Sep 24, 2017 at 16:12 UTC

Update 2: Thanks Dana, the popcnt function moved to util.h in Math::Prime::Util v0.62.

Update 1: Added links to popcount.cpp and popcnt. Dana replaced popcnt with mpu_popcount_string in Math::Prime::Util v0.62.

Found in mce-sandbox/src/bits.h and used here, I received help by reading popcount.cpp from primesieve.org and util.c (popcnt) from Math::Prime::Util <= v0.61. Although the following is tailored for counting set bits inside a string, I'm sharing the code to show-case 64bits and 32bits support inside a function, determined by the __LP64__ pragma.

#ifndef BITS_H
#define BITS_H

#include <stdint.h>

typedef unsigned char byte_t;

static const int popcnt_byte[256] = {
   0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
   1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
   1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
   2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
   1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
   2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
   2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
   3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8
};

static uint64_t popcount(const byte_t *bytearray, uint64_t size)
{
   uint64_t asize, i, count = 0;

   if (bytearray == 0 || size == 0)
      return count;

   if (size > 8) {
      #ifdef __LP64__
         static const uint64_t m1  = UINT64_C(0x5555555555555555);
         static const uint64_t m2  = UINT64_C(0x3333333333333333);
         static const uint64_t m4  = UINT64_C(0x0f0f0f0f0f0f0f0f);
         static const uint64_t h01 = UINT64_C(0x0101010101010101);

         const uint64_t *a = (uint64_t *) bytearray;

         asize = (size + 7) / 8 - 1;

         for (i = 0; i < asize; i++) {
            uint64_t b = a[i];
            b =  b       - ((b >> 1)  & m1);
            b = (b & m2) + ((b >> 2)  & m2);
            b = (b       +  (b >> 4)) & m4;
            count += (b * h01) >> 56;
         }

         i = asize * 8;

      #else
         static const uint32_t m1  = UINT32_C(0x55555555);
         static const uint32_t m2  = UINT32_C(0x33333333);
         static const uint32_t m4  = UINT32_C(0x0f0f0f0f);
         static const uint32_t h01 = UINT32_C(0x01010101);

         const uint32_t *a = (uint32_t *) bytearray;

         asize = (size + 3) / 4 - 1;

         for (i = 0; i < asize; i++) {
            uint32_t b = a[i];
            b =  b       - ((b >> 1)  & m1);
            b = (b & m2) + ((b >> 2)  & m2);
            b = (b       +  (b >> 4)) & m4;
            count += (b * h01) >> 24;
         }

         i = asize * 4;

      #endif
   }
   else
      i = 0;

   for (; i < size; i++)
      count += popcnt_byte[bytearray[i]];

   return count;
}

#endif
[download]

Regards, Mario

[reply]
[d/l]
[select]

Re^2: How to do popcount (aka Hamming weight) in Perl

by danaj (Friar) on Sep 26, 2017 at 17:04 UTC

Dana replaced popcnt with mpu_popcount_string in Math::Prime::Util v0.62.

I moved the native popcnt to util.h. mpu_popcount_string is a new function to handle bigints or sufficiently magic input without using a bigint library. It's faster than getting Math::BigInt involved until the numbers get over 500 digits. It should be optimized, but it's not exactly the common case.

It's also useful for making sure we don't get really slow for 64-bit numbers on a 32-bit Perl. Not as fast as forcing 64-bit code, but that's further narrowing down the space -- people on 64-bit machines who install a 32-bit Perl.

Using Math::BigInt to handle a long digit string would be something like:

use Math::BigInt;
my $n = "16" x 1000;
say 0 + (Math::BigInt->new("$n")->as_bin() =~ tr/1//);
[download]

Rmpz_popcount

[reply]
[d/l]
[select]

Re: How to do popcount (aka Hamming weight) in Perl
by LanX (Saint) on Sep 24, 2017 at 10:37 UTC

> Improvements welcome.

Performance wise?

I think a pre computed lookup table is the way I'd go.

An array with 2**16 entries should be a good compromise between time and memory complexity.

(Since Perl doesn't fit into a CPU's line cache)

Sorry too busy to code it for you :)

Cheers Rolf
_{(addicted to the Perl Programming Language and ☆☆☆☆ :)

Je suis Charlie!}

[reply]

Re: How to do popcount (aka Hamming weight) in Perl
by LanX (Saint) on Sep 24, 2017 at 12:58 UTC

> unpack('%32b*'

so Perl has already a specialized command for this! (cool)

but why do you need

> pack('Q'

then?

If it's some kind of normalization, why do you need to benchmark it too instead of keeping the data in that format?

Cheers Rolf
_{(addicted to the Perl Programming Language and ☆☆☆☆ :)

Je suis Charlie!}

[reply]

Re: How to do popcount (aka Hamming weight) in Perl
by danaj (Friar) on Sep 26, 2017 at 19:12 UTC

There are a fair number of modules that do this. For instance, RosettaCode shows:

use ntheory qw/hammingweight/;
say hammingweight(1234567);
 
use Math::GMPz qw/Rmpz_popcount/;
say Rmpz_popcount(Math::GMPz->new(1234567));
 
use Math::BigInt;
say 0 + (Math::BigInt->new(1234567)->as_bin() =~ tr/1//);
 
use Bit::Vector;
say Bit::Vector->new_Dec(64,1234567)->Norm;
[download]

ntheory

Bit::Fast

Once we start calling from Perl, the differences in overhead become more apparent. Note that BrowserUK's benchmark as posted greatly favors his Inline::C function, as all the others get an added Perl subroutine call added to them, and Perl sub calls are not at all cheap compared to this function. Correcting that, and adding some modules, I get:

                Rate bigint  GMPz bitvector Three Four  Two ntheory Inline raw_nt bitfast
bigint    4.56e-02/s     --  -93%      -94%  -99% -99% -99%   -100%  -100%  -100%   -100%
GMPz         0.667/s  1361%    --      -18%  -85% -85% -86%    -93%   -95%   -96%    -96%
bitvector    0.811/s  1676%   22%        --  -81% -82% -83%    -92%   -93%   -95%    -95%
Three         4.32/s  9363%  548%      433%    --  -4% -10%    -57%   -65%   -71%    -71%
Four          4.50/s  9763%  575%      455%    4%   --  -6%    -56%   -64%   -70%    -70%
Two           4.81/s 10434%  621%      493%   11%   7%   --    -53%   -61%   -68%    -68%
ntheory       10.1/s 22087% 1419%     1149%  134% 125% 111%      --   -18%   -32%    -33%
Inline        12.4/s 27108% 1763%     1432%  188% 176% 158%     23%     --   -17%    -18%
raw_nt        15.0/s 32729% 2148%     1748%  247% 233% 212%     48%    21%     --     -1%
bitfast       15.1/s 33012% 2167%     1764%  250% 236% 214%     49%    22%     1%      --

Some of these functions have more features than others, e.g. Math::GMPz, bigint, and ntheory will handle bigints correctly. raw_nt is ntheory with a new trivial XS function added that just takes a UV and returns the result of popcnt -- no validation or input size checks, just like Bit::Fast or the Inline C function. Not surprisingly, it comes out the same speed on this machine (both use gcc with asm popcnt support). We are talking about an instruction with 1 cycle latency on newer processors.

[reply]
[d/l]
[select]

Back to Seekers of Perl Wisdom