http://qs321.pair.com?node_id=887726


in reply to Re^11: Module for 128-bit integer math?
in thread Module for 128-bit integer math?

And especially thanks to salva

I'll second that !!

There are of course, more things to consider than just *performance* when it comes to choosing which Math module to use - but I was curious to see how Math::Int128 compared for speed with Math::GMPz (which uses the gmp C library).

I found that, if using overloaded operators with Math::GMPz, then Math::GMPz was roughly twice as slow as Math::Int128 - but if using the arithmetic functions that Math::GMPz also provides, then Math::GMPz was roughly twice as fast as Math::Int128.

Without yet having taken the time to look closely at Math::Int128, I believe there would be a way of having it out-perform Math::GMPz. (And I'm about to embark upon a little excercise to satisfy myself that this is so.)
For any Windows users who want to check the comparisons for themselves, there are ppm packages for 64-bit perl (5.10 and 5.12) available:
ppm install http://www.sisyphusion.tk/ppm/Math-GMPz.ppd ppm install http://www.sisyphusion.tk/ppm/Math-Int128.ppd
The benchmarks I ran are as follows:
use warnings; use Math::Int128 qw(int128); use Math::GMPz qw(:mpz); use Benchmark qw(:all); $count = 40000; $mpz1 = Math::GMPz->new('676469752303423489'); $mpz2 = Math::GMPz->new('776469752999423489'); $i_1 = int128("$mpz1"); $i_2 = int128("$mpz2"); $mpz_sub = Math::GMPz->new('976469752313423489'); $i_sub = int128("$mpz_sub"); $mpz_div = Math::GMPz->new('76469752313423489'); $i_div = int128("$mpz_div"); $mpz_ret = Rmpz_init2(128); print " ****************** **MULTIPLICATION** ******************\n\n"; cmpthese($count * 9, { 'mul_M::I' => '$ri = Math::Int128::_mul($i_1, $i_2, 0)', 'mul_M::G1' =>'$mpz_ret = $mpz1 * $mpz2', 'mul_M::G2'=> 'Rmpz_mul($mpz_ret, $mpz1, $mpz2)', }); die "Error 1:\n$ri\n$mpz_ret\n" if $ri != int128("$mpz_ret") || $ri != int128('525258301482620425304858018020933121'); $i_1 *= $i_1; $i_2 *= $i_2; $mpz1 *= $mpz1; $mpz2 *= $mpz2; print " ****************** *****DIVISION***** ******************\n\n"; cmpthese($count * 9, { 'div_M::I' => '$ri = Math::Int128::_div($i_1, $i_div, 0)', 'div_M::G1' =>'$mpz_ret = $mpz1 / $mpz_div', 'div_M::G2'=> 'Rmpz_tdiv_q($mpz_ret, $mpz1, $mpz_div)', }); die "Error 2:\n$ri\n$mpz_ret\n" if $ri != int128("$mpz_ret") || $ri != int128('5984213521522366751'); print" ****************** *****ADDITION***** ******************\n\n"; cmpthese($count * 10, { 'add_M::I' => '$ri = Math::Int128::_add($i_1, $i_2, 0)', 'add_M::G1' => '$mpz_ret = $mpz1 + $mpz2', 'add_M::G2' => 'Rmpz_add($mpz_ret, $mpz1, $mpz2)', }); die "Error 3:\n$ri\n$mpz_ret\n" if $ri != int128("$mpz_ret") || $ri != int128('1060516603104440851094132036041866242'); print " ****************** ****SUBTRACTION*** ******************\n\n"; cmpthese($count * 10, { 'add_M::I' => '$ri = Math::Int128::_sub($i_1, $i_sub, 0)', 'add_M::G1' => '$mpz_ret = $mpz1 - $mpz_sub', 'add_M::G2' => 'Rmpz_sub($mpz_ret, $mpz1, $mpz_sub)', }); die "Error 4:\n$ri\n$mpz_ret\n" if $ri != int128("$mpz_ret") || $ri != int128('457611325781455127825205517363509632');
The output I got was:
C:\_64\pscrpt>perl bench.pl Name "main::i_div" used only once: possible typo at bench.pl line 17. Name "main::i_sub" used only once: possible typo at bench.pl line 14. ****************** **MULTIPLICATION** ****************** Rate mul_M::G1 mul_M::I mul_M::G2 mul_M::G1 156726/s -- -58% -82% mul_M::I 371517/s 137% -- -58% mul_M::G2 886700/s 466% 139% -- ****************** *****DIVISION***** ****************** Rate div_M::G1 div_M::I div_M::G2 div_M::G1 148637/s -- -57% -81% div_M::I 344168/s 132% -- -57% div_M::G2 794702/s 435% 131% -- ****************** *****ADDITION***** ****************** Rate add_M::G1 add_M::I add_M::G2 add_M::G1 148810/s -- -60% -83% add_M::I 376648/s 153% -- -56% add_M::G2 852878/s 473% 126% -- ****************** ****SUBTRACTION*** ****************** Rate add_M::G1 add_M::I add_M::G2 add_M::G1 147113/s -- -60% -83% add_M::I 365631/s 149% -- -59% add_M::G2 883002/s 500% 142% -- C:\_64\pscrpt>
Update: I've since written an Inline::C script where the __int128 multiplications beat Math::GMPz ... but not by much:
Rate gmpz mul_128 gmpz 900000/s -- -3% mul_128 927835/s 3% --
The code I used is still a bit rough, but I don't see much scope for improvement at the moment.

Cheers,
Rob

Replies are listed 'Best First'.
Re^13: Module for 128-bit integer math? (updated)
by BrowserUk (Patriarch) on Feb 12, 2011 at 10:08 UTC

    That's interesting, but there are reasons other than performance for wanting fix-width integer semantics.

    Try coding this with GMP (or any arbitrary precision math):

    #! perl -slw use strict; use Math::Int128 qw[int128 uint128]; my $ZERO = uint128( 0 ); my $ONE = uint128( 1 ); my $ALLONES = ~ $ZERO; my $MASK1 = $ALLONES / 3; my $MASK2 = $ALLONES / 15 * 3; my $MASK3 = $ALLONES / 255; my $MASK4 = $MASK3 * 15; sub bcount { my $v = shift; $v = $v - ( ( $v >> 1 ) & $MASK1 ); $v = ( $v & $MASK2 ) + ( ( $v >> 2 ) & $MASK2 ); $v = ( $v + ( $v >> 4 ) ) & $MASK4; my $c = ( $v * ( $MASK3 ) ) >> 120; return $c; } my $bits = uint128( 0 ); ## set every 3rd bit $bits |= $ONE << $_ for map $_*3+1, 0 .. 42; print $bits; ## count the bits (43) print bcount( $bits ); ## now invert them and count again. (85) print bcount( ~$bits ); __END__ c:\test>m128.pl 194447066811964836264785489961010406546 43 85

    Note: I would have supplied a GMPz version, but I couldn't work out how to do it from the docs. There seem to be a million ways to initialise a number; a million ways to divide two numbers etc, But not so much when it comes to doing bitwise math.

    Update: Here's my attempt at a GMP version of the above. Note that it doesn't just produce the wrong results, but it does so silently:

    #! perl -slw use strict; use Math::GMPz; my $ZERO = Math::GMPz->new( 0 ); my $ONE = Math::GMPz->new( 1 ); my $ALLONES = ~ $ZERO; my $MASK1 = $ALLONES / 3; my $MASK2 = $ALLONES / 15 * 3; my $MASK3 = $ALLONES / 255; my $MASK4 = $MASK3 * 15; sub bcount { my $v = shift; $v = $v - ( ( $v >> 1 ) & $MASK1 ); $v = ( $v & $MASK2 ) + ( ( $v >> 2 ) & $MASK2 ); $v = ( $v + ( $v >> 4 ) ) & $MASK4; my $c = ( $v * ( $MASK3 ) ) >> 120; return $c; } my $bits = Math::GMPz->new( 0 ); ## set every 3rd bit $bits |= $ONE << $_ for map $_*3+1, 0 .. 42; print $bits; ## count the bits (43) print bcount( $bits ); ## now invert them and count again. (85) print bcount( ~$bits ); __END__ c:\test>gmpz-t.pl 194447066811964836264785489961010406546 0 0

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      there are reasons other than performance for wanting fix-width integer semantics

      Definitely correct ... though I must confess that the reasons you've implicitly put forward in your demos were not exactly what I was thinking of when I acknowledged this (or tried to) in my earlier post.

      For a Math::GMPz rendition of your demo I would put forward the following:
      #! perl -slw use strict; use Math::GMPz qw(:mpz); my $ZERO = Math::GMPz->new( 0 ); my $ONE = Math::GMPz->new( 1 ); #my $ALLONES = ~ $ZERO; my $ALLONES = (Math::GMPz->new(1) << 128) - 1; my $MASK1 = $ALLONES / 3; my $MASK2 = $ALLONES / 15 * 3; my $MASK3 = $ALLONES / 255; my $MASK4 = $MASK3 * 15; my $bits = Math::GMPz->new( 0 ); ## set every 3rd bit $bits |= $ONE << $_ for map $_*3+1, 0 .. 42; print $bits; ## count the bits (43) #print bcount( $bits ); print Rmpz_popcount($bits); ## now invert them and count again. (85) #print bcount( ~$bits ); print Rmpz_popcount($ALLONES ^ $bits);
      I hope it doesn't miss the point ... I think the only changes are that:
      1) It changes the way that $ALLONES is created;
      2) It removes the need for sub bcount()

      Update: And, there's also now a qw(:mpz) in the third line.

      Cheers,
      Rob
        2) It removes the need for sub bcount()

        That does miss the point, which was not what the bcount() sub did, but the way it did it.

        The result of using bitwise math using arbitrary precision doesn't follow the fix-width integer semantics. Ie. the math isn't automatically module 2^128.

        That means you get quite different results:

        #! perl -slw use strict; use Math::GMPz; my $ZERO = Math::GMPz->new( 0 ); my $ONE = Math::GMPz->new( 1 ); #my $ALLONES = ~ $ZERO; my $ALLONES = (Math::GMPz->new(1) << 128) - 1; my $MASK1 = $ALLONES / 3; my $MASK2 = $ALLONES / 15 * 3; my $MASK3 = $ALLONES / 255; my $MASK4 = $MASK3 * 15; sub bcount { my $v = shift; $v = $v - ( ( $v >> 1 ) & $MASK1 ); $v = ( $v & $MASK2 ) + ( ( $v >> 2 ) & $MASK2 ); $v = ( $v + ( $v >> 4 ) ) & $MASK4; my $c = ( $v * ( $MASK3 ) ) >> 120; return $c; } my $bits = Math::GMPz->new( 0 ); ## set every 3rd bit $bits |= $ONE << $_ for map $_*3+1, 0 .. 42; print $bits; ## count the bits (43) print bcount( $bits ); ## now invert them and count again. (85) print bcount( ~$bits ); __END__ C:\test>gmpz-t.pl 194447066811964836264785489961010406546 4019000903644796537894933644280473643 6698389137940470254461716813547458644

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re^13: Module for 128-bit integer math?
by salva (Canon) on Feb 14, 2011 at 12:01 UTC
    but I don't see much scope for improvement at the moment

    Well, there is...

    The G2 version is faster than G1 and I because it is not allocating and deallocating the result object every time but reusing $mpz_ret.

    After adding to Math::Int128 a new set of operators that use a preallocated argument for output, Math::Int128 becomes faster than Math::GMPz, around 60% faster.

    The modified benchmark script:

    And the results I get on my 64bits-linux-but-with-a-not-very-optimized-for-64bits-old-processor:

    ****************** **MULTIPLICATION** ****************** Rate mul_M::G1 mul_M::I mul_M::G2 mul_M::I2 mul_M::G1 321555/s -- -72% -87% -91% mul_M::I 1147836/s 257% -- -52% -69% mul_M::G2 2406041/s 648% 110% -- -35% mul_M::I2 3709585/s 1054% 223% 54% -- ****************** *****DIVISION***** ****************** Rate div_M::G1 div_M::I div_M::G2 div_M::I2 div_M::G1 314139/s -- -71% -83% -88% div_M::I 1092266/s 248% -- -39% -59% div_M::G2 1799026/s 473% 65% -- -32% div_M::I2 2633983/s 738% 141% 46% -- ****************** *****ADDITION***** ****************** Rate add_M::G1 add_M::I add_M::G2 add_M::I2 add_M::G1 317021/s -- -71% -86% -91% add_M::I 1092266/s 245% -- -51% -70% add_M::G2 2248783/s 609% 106% -- -38% add_M::I2 3598054/s 1035% 229% 60% -- ****************** ****SUBTRACTION*** ****************** Rate sub_M::G1 sub_M::I sub_M::G2 sub_M::I2 sub_M::G1 309967/s -- -72% -84% -91% sub_M::I 1113475/s 259% -- -44% -68% sub_M::G2 1997468/s 544% 79% -- -43% sub_M::I2 3495253/s 1028% 214% 75% --
    The new version of the module can be obtained from GitHub.
      Math::Int128 becomes faster than Math::GMPz, around 60%

      Using the latest version of Math::Int128, and your modified script, I find an improvement (on Windows Vista) of around 40%. Given that we're using different operating systems and probably different processors, I think we can agree that "I find the same as you".

      Cheers,
      Rob

      I should add that even 40% is better than I could get with my approach to Math::Int128 modifications. I might learn something if I ever find the time and energy to discover why that was so. (Best I could get was to have the int128 arithemtic about 5-10% faster than Math::GMPz.)

        Well, I have been able to improve Math::Int128 significantly since yesterday. Now, on my computer, it is twice as fast as Math::GMPz running your benchmarks.

        There were two kinds of improvements: providing the 3-argument operators (i.e. uint128_add($to, $a, $b)) and optimizing the SV to int128_t conversions.

        I have also found that the code generated by GCC-current is quite unpredictable. Sometimes things that should be faster are slower. So the tunning I have carried out may be ineffective for your particular version of GCC.

      After adding to Math::Int128 a new set of operators that use a preallocated argument for output, Math::Int128 becomes faster than Math::GMPz, around 60% faster

      That's more like the figures I anticipated (at a guess).
      My "I don't see much scope for improvement" comment was in relation to my own enhancements to Math::Int128 which removed the allocating/deallocating you mentioned - and which made the int128 multiplication a little faster than using Math::GMPz's functions (and, of course, significantly faster than using Math::GMPz's overloaded operators).

      When I get back to my Vista64 box, I'll grab the latest github version and see for myself ... and, of course, post again with my results for that machine.

      Cheers,
      Rob