http://qs321.pair.com?node_id=887971


in reply to Re^12: Module for 128-bit integer math?
in thread Module for 128-bit integer math?

but I don't see much scope for improvement at the moment

Well, there is...

The G2 version is faster than G1 and I because it is not allocating and deallocating the result object every time but reusing $mpz_ret.

After adding to Math::Int128 a new set of operators that use a preallocated argument for output, Math::Int128 becomes faster than Math::GMPz, around 60% faster.

The modified benchmark script:

use Math::Int128 qw(int128 :op); use Math::GMPz qw(:mpz); use Benchmark qw(:all); $count = 40000; $mpz1 = Math::GMPz->new('676469752303423489'); $mpz2 = Math::GMPz->new('776469752999423489'); $i_1 = int128("$mpz1"); $i_2 = int128("$mpz2"); $mpz_sub = Math::GMPz->new('976469752313423489'); $i_sub = int128("$mpz_sub"); $mpz_div = Math::GMPz->new('76469752313423489'); $i_div = int128("$mpz_div"); $mpz_ret = Rmpz_init2(128); $i_ret = int128(); use warnings; print " ****************** **MULTIPLICATION** ******************\n\n"; cmpthese(-1, { 'mul_M::I' => '$ri = Math::Int128::_mul($i_1, $i_2, 0)', 'mul_M::I2'=> 'int128_mul($i_ret, $i_1, $i_2)', 'mul_M::G1'=> '$mpz_ret = $mpz1 * $mpz2', 'mul_M::G2'=> 'Rmpz_mul($mpz_ret, $mpz1, $mpz2)', }); die "Error 1:\n$ri\n$mpz_ret\n$i_ret\n" if $ri != int128("$mpz_ret") || $ri != int128('525258301482620425304858018020933121') || $ri != $i +_ret; $i_1 *= $i_1; $i_2 *= $i_2; $mpz1 *= $mpz1; $mpz2 *= $mpz2; # print "i_1: $i_1, i_2: $i_2\n"; print " ****************** *****DIVISION***** ******************\n\n"; cmpthese(-1, { 'div_M::I' => '$ri = Math::Int128::_div($i_1, $i_div, 0)', 'div_M::I2'=> 'int128_div($i_ret, $i_1, $i_div)', 'div_M::G1'=>'$mpz_ret = $mpz1 / $mpz_div', 'div_M::G2'=> 'Rmpz_tdiv_q($mpz_ret, $mpz1, $mpz_div)', }); die "Error 2:\n$ri\n$mpz_ret\n$i_ret\n" if $ri != int128("$mpz_ret") || $ri != int128('5984213521522366751') || $ri != $i_ret; print" ****************** *****ADDITION***** ******************\n\n"; cmpthese(-1, { 'add_M::I' => '$ri = Math::Int128::_add($i_1, $i_2, 0)', 'add_M::I2' => 'int128_add($i_ret, $i_1, $i_2)', 'add_M::G1' => '$mpz_ret = $mpz1 + $mpz2', 'add_M::G2' => 'Rmpz_add($mpz_ret, $mpz1, $mpz2)', }); die "Error 3:\n$ri\n$mpz_ret\n$i_ret\n" if $ri != int128("$mpz_ret") || $ri != int128('1060516603104440851094132036041866242') || $ri != $ +i_ret; print " ****************** ****SUBTRACTION*** ******************\n\n"; cmpthese(-1, { 'sub_M::I' => '$ri = Math::Int128::_sub($i_1, $i_sub, 0)', 'sub_M::I2' => 'int128_sub($i_ret, $i_1, $i_sub)', 'sub_M::G1' => '$mpz_ret = $mpz1 - $mpz_sub', 'sub_M::G2' => 'Rmpz_sub($mpz_ret, $mpz1, $mpz_sub)', }); die "Error 4:\n$ri\n$mpz_ret\n$i_ret\n" if $ri != int128("$mpz_ret") || $ri != int128('457611325781455127825205517363509632') || $ri != $i +_ret;

And the results I get on my 64bits-linux-but-with-a-not-very-optimized-for-64bits-old-processor:

****************** **MULTIPLICATION** ****************** Rate mul_M::G1 mul_M::I mul_M::G2 mul_M::I2 mul_M::G1 321555/s -- -72% -87% -91% mul_M::I 1147836/s 257% -- -52% -69% mul_M::G2 2406041/s 648% 110% -- -35% mul_M::I2 3709585/s 1054% 223% 54% -- ****************** *****DIVISION***** ****************** Rate div_M::G1 div_M::I div_M::G2 div_M::I2 div_M::G1 314139/s -- -71% -83% -88% div_M::I 1092266/s 248% -- -39% -59% div_M::G2 1799026/s 473% 65% -- -32% div_M::I2 2633983/s 738% 141% 46% -- ****************** *****ADDITION***** ****************** Rate add_M::G1 add_M::I add_M::G2 add_M::I2 add_M::G1 317021/s -- -71% -86% -91% add_M::I 1092266/s 245% -- -51% -70% add_M::G2 2248783/s 609% 106% -- -38% add_M::I2 3598054/s 1035% 229% 60% -- ****************** ****SUBTRACTION*** ****************** Rate sub_M::G1 sub_M::I sub_M::G2 sub_M::I2 sub_M::G1 309967/s -- -72% -84% -91% sub_M::I 1113475/s 259% -- -44% -68% sub_M::G2 1997468/s 544% 79% -- -43% sub_M::I2 3495253/s 1028% 214% 75% --
The new version of the module can be obtained from GitHub.

Replies are listed 'Best First'.
Re^14: Module for 128-bit integer math?
by syphilis (Archbishop) on Feb 15, 2011 at 11:14 UTC
    Math::Int128 becomes faster than Math::GMPz, around 60%

    Using the latest version of Math::Int128, and your modified script, I find an improvement (on Windows Vista) of around 40%. Given that we're using different operating systems and probably different processors, I think we can agree that "I find the same as you".

    Cheers,
    Rob

    I should add that even 40% is better than I could get with my approach to Math::Int128 modifications. I might learn something if I ever find the time and energy to discover why that was so. (Best I could get was to have the int128 arithemtic about 5-10% faster than Math::GMPz.)

      Well, I have been able to improve Math::Int128 significantly since yesterday. Now, on my computer, it is twice as fast as Math::GMPz running your benchmarks.

      There were two kinds of improvements: providing the 3-argument operators (i.e. uint128_add($to, $a, $b)) and optimizing the SV to int128_t conversions.

      I have also found that the code generated by GCC-current is quite unpredictable. Sometimes things that should be faster are slower. So the tunning I have carried out may be ineffective for your particular version of GCC.

        There were two kinds of improvements: providing the 3-argument operators (i.e. uint128_add($to, $a, $b)) and optimizing the SV to int128_t conversions.

        I think I got the first part handled ok - but I haven't given consideration to the second.
        For the record, my "rough estimation script" (for multiplication only) is as follows - in accordance with the usual mantra that I follow when creating objects:
        package Sis::UInt128; use warnings; use strict; use Math::GMPz qw(:mpz); use Benchmark qw(:all); use Inline C => Config => BUILD_NOISY => 1, TYPEMAPS => ['./typemap_128'], USING => 'ParseRegExp'; use Inline C => <<'EOC'; SV * create() { __uint128 *ps; SV * obj_ref, * obj; New(42, ps, 1, __uint128); if(ps == NULL) croak("Failed to allocate memory in create functio +n"); obj_ref = newSV(0); obj = newSVrv(obj_ref, "Sis::UInt128"); sv_setiv(obj, INT2PTR(IV,ps)); SvREADONLY_on(obj); return obj_ref; } void _assign(__uint128 * rop, SV * h, SV * l) { __uint128 __div = 9223372036854775808ULL; unsigned __int64 high, low; high = (unsigned __int64)SvUV(h); low = (unsigned __int64)SvUV(l); *rop = ((__uint128)__div * high) + (__uint128)low; } void _deref_obj(__uint128 * obj) { dXSARGS; __uint128 __div = 9223372036854775808ULL; ST(0) = sv_2mortal(newSVuv(*obj / __div)); ST(1) = sv_2mortal(newSVuv(*obj % __div)); XSRETURN(2); } void DESTROY(__uint128 * obj) { printf("Cleaning up\n"); Safefree(obj); } void mul_128(__uint128 * rop, __uint128 * op1, __uint128 * op2) { *rop = *op1 * *op2; } EOC our $mpz1 = Math::GMPz->new('676469752303423489'); our $mpz2 = Math::GMPz->new('776469752999423489'); our $mpz_ret = Math::GMPz::Rmpz_init2(128); our $i_ret = create(); our $i_1 = create(); our $i_2 = create(); assign($i_1, "$mpz1"); assign($i_2, "$mpz2"); our $count = 50000; timethese($count * 19, { 'mul_128' => 'mul_128($i_ret, $i_1, $i_2)', 'gmpz' => 'Math::GMPz::Rmpz_mul($mpz_ret, $mpz1, $mpz2)', }); print retrieve($i_1),"\n", retrieve($i_2),"\n", retrieve($i_ret), "\n" +; print "$mpz1\n$mpz2\n$mpz_ret\n"; sub assign { my $obj = shift; my $num = shift; my @args; my $a0 = Math::GMPz->new($num) / 9223372036854775808; my $a1 = Math::GMPz->new($num) % 9223372036854775808; $args[0] = "$a0" + 0; $args[1] = "$a1" + 0; _assign($obj, $args[0], $args[1]); } sub retrieve { my @in = _deref_obj($_[0]); my $num = Math::GMPz->new($in[0]); $num *= 9223372036854775808; $num += $in[1]; return "$num"; }
        with a "typemap_128" that looks like this:
        __uint128 * UI128 INPUT UI128 $var = INT2PTR($type, SvIV(SvRV($arg)))
        Cheers,
        Rob
Re^14: Module for 128-bit integer math?
by syphilis (Archbishop) on Feb 14, 2011 at 20:53 UTC
    After adding to Math::Int128 a new set of operators that use a preallocated argument for output, Math::Int128 becomes faster than Math::GMPz, around 60% faster

    That's more like the figures I anticipated (at a guess).
    My "I don't see much scope for improvement" comment was in relation to my own enhancements to Math::Int128 which removed the allocating/deallocating you mentioned - and which made the int128 multiplication a little faster than using Math::GMPz's functions (and, of course, significantly faster than using Math::GMPz's overloaded operators).

    When I get back to my Vista64 box, I'll grab the latest github version and see for myself ... and, of course, post again with my results for that machine.

    Cheers,
    Rob