good chemistry is complicated,and a little bit messy -LW PerlMonks

### Re^14: Module for 128-bit integer math? (updated)

by syphilis (Bishop)
 on Feb 12, 2011 at 14:30 UTC ( #887760=note: print w/replies, xml ) Need Help??

there are reasons other than performance for wanting fix-width integer semantics

Definitely correct ... though I must confess that the reasons you've implicitly put forward in your demos were not exactly what I was thinking of when I acknowledged this (or tried to) in my earlier post.

For a Math::GMPz rendition of your demo I would put forward the following:
```#! perl -slw
use strict;
use Math::GMPz qw(:mpz);

my \$ZERO = Math::GMPz->new( 0 );
my \$ONE  = Math::GMPz->new( 1 );
#my \$ALLONES = ~ \$ZERO;
my \$ALLONES = (Math::GMPz->new(1) << 128) - 1;
my \$MASK1 = \$ALLONES / 3;
my \$MASK2 = \$ALLONES / 15 * 3;
my \$MASK3 = \$ALLONES / 255;

my \$bits = Math::GMPz->new( 0 );

## set every 3rd bit
\$bits |= \$ONE << \$_ for map \$_*3+1, 0 .. 42;
print \$bits;
## count the bits (43)
#print bcount( \$bits );
print Rmpz_popcount(\$bits);

## now invert them and count again. (85)
#print bcount( ~\$bits );
print Rmpz_popcount(\$ALLONES ^ \$bits);
I hope it doesn't miss the point ... I think the only changes are that:
1) It changes the way that \$ALLONES is created;
2) It removes the need for sub bcount()

Update: And, there's also now a qw(:mpz) in the third line.

Cheers,
Rob

Replies are listed 'Best First'.
Re^15: Module for 128-bit integer math? (updated)
by BrowserUk (Pope) on Feb 12, 2011 at 14:41 UTC
2) It removes the need for sub bcount()

That does miss the point, which was not what the bcount() sub did, but the way it did it.

The result of using bitwise math using arbitrary precision doesn't follow the fix-width integer semantics. Ie. the math isn't automatically module 2^128.

That means you get quite different results:

```#! perl -slw
use strict;
use Math::GMPz;

my \$ZERO = Math::GMPz->new( 0 );
my \$ONE  = Math::GMPz->new( 1 );
#my \$ALLONES = ~ \$ZERO;
my \$ALLONES = (Math::GMPz->new(1) << 128) - 1;
my \$MASK1 = \$ALLONES / 3;
my \$MASK2 = \$ALLONES / 15 * 3;
my \$MASK3 = \$ALLONES / 255;

sub bcount {
my \$v = shift;
\$v = \$v - ( ( \$v >> 1 ) & \$MASK1 );
\$v = ( \$v & \$MASK2 ) + ( ( \$v >> 2 ) & \$MASK2 );
\$v = ( \$v + ( \$v >> 4 ) ) &  \$MASK4;
my \$c = ( \$v * ( \$MASK3 ) ) >> 120;
return \$c;
}

my \$bits = Math::GMPz->new( 0 );

## set every 3rd bit
\$bits |= \$ONE << \$_ for map \$_*3+1, 0 .. 42;
print \$bits;

## count the bits (43)
print bcount( \$bits );

## now invert them and count again. (85)
print bcount( ~\$bits );

__END__
C:\test>gmpz-t.pl
194447066811964836264785489961010406546
4019000903644796537894933644280473643
6698389137940470254461716813547458644

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
the math isn't automatically module 2^128

Quite so ... I don't think the gmp library provides a way of enforcing fixed precision. Therefore "the overflow" doesn't get automatically discarded, and you would have to remove it yourself by doing a mod(2 ** 128):
```#! perl -slw
use strict;
use Math::GMPz qw(:mpz);

my \$ZERO = Math::GMPz->new( 0 );
my \$ONE  = Math::GMPz->new( 1 );
my \$MOD128  = Math::GMPz->new(1) << 128;
my \$ALLONES = \$MOD128 - 1;
my \$MASK1 = \$ALLONES / 3;
my \$MASK2 = \$ALLONES / 15 * 3;
my \$MASK3 = \$ALLONES / 255;

my \$bits = Math::GMPz->new( 0 );

## set every 3rd bit
\$bits |= \$ONE << \$_ for map \$_*3+1, 0 .. 42;
print \$bits;

## count the bits (43)
print bcount( \$bits );

## now invert them and count again. (85)
#print bcount( ~\$bits );
print bcount(\$ALLONES ^ \$bits);

sub bcount {
my \$v = shift;
\$v = \$v - ( ( \$v >> 1 ) & \$MASK1 );
\$v = ( \$v & \$MASK2 ) + ( ( \$v >> 2 ) & \$MASK2 );
\$v = ( \$v + ( \$v >> 4 ) ) &  \$MASK4;
my \$c = ( ( \$v * ( \$MASK3 ) ) % \$MOD128) >> 120;
return \$c;
}

__END__
C:\_64\pscrpt>perl buk1.pl
194447066811964836264785489961010406546
43
85
Not sure what to do with the ~ operator in Math::GMPz. Currently it just does a one's complement - which is not the right thing for your purposes.
I notice that Math::GMP doesn't overload the ~ operator ... perhaps Math::GMPz should do the same ?

Cheers,
Rob
you would have to remove it yourself by doing a mod(2 ** 128):

That's exactly what I wish to avoid. It's not even so much to do with having to perform the modulo, (pain in the ass though that it is). It's more knowing when it needs to be done.

For some calculations, doing them at arbitrary precision and doing a single modulo operation at the end will produce the right result, but at the cost of all the intermediate calculations being done at far higher precision, and therefore much more slowly, than is wanted for the final result.

As (just) an example of this, the following compares performing 128-bit FNV1 and FNV1a hashing, both fixed precision; and infinite precision with the mod'ing done per step, and once at the end:

```#! perl -slw
use strict;
use Benchmark qw[ cmpthese ];
use Math::Int128 qw[ uint128 uint128_to_hex ];
use Math::GMPz;

sub FNV_1_128 {
my \$s = shift;
my \$h = uint128( '144066263297769815596495629667062367629' );
my \$p = uint128( '309485009821345068724781371' );
\$h *= \$p, \$h ^= \$_ for unpack 'C*', \$s;
return \$h;
}

sub FNV_1a_128 {
my \$s = shift;
my \$h = uint128( '144066263297769815596495629667062367629' );
my \$p = uint128( '309485009821345068724781371' );
\$h ^= \$_, \$h *= \$p for unpack 'C*', \$s;
return \$h;
}

sub FNV_1_128_gmpz {
my \$s = shift;
my \$h = Math::GMPz->new( '144066263297769815596495629667062367629'
+ );
my \$p = Math::GMPz->new( '309485009821345068724781371' );
my \$m = Math::GMPz->new( 1 ) << 128;
\$h *= \$p, \$h ^= \$_ for unpack 'C*', \$s;
return \$h % \$m;
}

sub FNV_1a_128_gmpz {
my \$s = shift;
my \$h = Math::GMPz->new( '144066263297769815596495629667062367629'
+ );
my \$p = Math::GMPz->new( '309485009821345068724781371' );
my \$m = Math::GMPz->new( 1 ) << 128;
\$h ^= \$_, \$h *= \$p for unpack 'C*', \$s;
return \$h % \$m;
}

sub FNV_1_128_gmpz2 {
my \$s = shift;
my \$h = Math::GMPz->new( '144066263297769815596495629667062367629'
+ );
my \$p = Math::GMPz->new( '309485009821345068724781371' );
my \$m = Math::GMPz->new( 1 ) << 128;
\$h *= \$p, \$h ^= \$_, \$h %= \$m for unpack 'C*', \$s;
return \$h;
}

sub FNV_1a_128_gmpz2 {
my \$s = shift;
my \$h = Math::GMPz->new( '144066263297769815596495629667062367629'
+ );
my \$p = Math::GMPz->new( '309485009821345068724781371' );
my \$m = Math::GMPz->new( 1 ) << 128;
\$h ^= \$_, \$h *= \$p, \$h %= \$m for unpack 'C*', \$s;
return \$h;
}

our \$text = do{ local( @ARGV, \$/ ) = \$0; <> };
print length \$text;

cmpthese -1, {
int128 => q[
my \$fnv1  = uint128_to_hex( FNV_1_128( \$text ) );
my \$fnv1a = uint128_to_hex( FNV_1a_128( \$text ) );
],
GMPz => q[
my \$fnv1  = Math::GMPz::Rmpz_get_str( FNV_1_128_gmpz( \$text ),
+ 16 );
my \$fnv1a = Math::GMPz::Rmpz_get_str( FNV_1a_128_gmpz( \$text )
+, 16 );
],
GMPz2 => q[
my \$fnv1  = Math::GMPz::Rmpz_get_str( FNV_1_128_gmpz2( \$text )
+, 16 );
my \$fnv1a = Math::GMPz::Rmpz_get_str( FNV_1a_128_gmpz2( \$text
+), 16 );
],
};

__END__
C:\test>FNV128.pl
2455
Rate   GMPz  GMPz2 int128
GMPz   14.1/s     --   -77%   -95%
GMPz2  61.4/s   334%     --   -78%
int128  274/s  1840%   347%     --

Notice how the _gmpz2() functions are 3 1/2 times faster than the _gmpz() functions, despite that they perform 2,455 modulo 128 operations on intermediate values, versus the latter's single modulo 128 at the end. The cost of carrying that extra precision adds up fast.

But far more pernicious are calculations where not performing the modulo operation at the appropriate times actually changes the outcome of the calculation. I went through the same thing trying to use Math::BigInt to perform 64-bit math before I had a 64-integer capable Perl. You start sticking in modulo ops all over the place with the inevitable disastrous affects on both the readability of the calculation and performance. And then you start trying to back them out again one at a time and verifying the results to ensure you haven't screwed the calculation up in the process.

The performance affects are bad enough. But the affect on readability and maintainability of complex calculations is simply too high to pay. Same thing goes for using the function call interface rather than the overloaded interface. It renders simple calculations all but unreadable; and complex ones nigh impossible to construct.

That's why I asked up front about a 128-bit math package, not an infinite precision math package. And why I tried really hard to avoid this debate. (Note. Not with you, given that you've provided me with the to both packages :)

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Create A New User
Node Status?
node history
Node Type: note [id://887760]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2020-11-24 03:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?

No recent polls found

Notices?