Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^13: Module for 128-bit integer math?

by salva (Canon)
on Feb 14, 2011 at 12:01 UTC ( [id://887971]=note: print w/replies, xml ) Need Help??


in reply to Re^12: Module for 128-bit integer math?
in thread Module for 128-bit integer math?

but I don't see much scope for improvement at the moment

Well, there is...

The G2 version is faster than G1 and I because it is not allocating and deallocating the result object every time but reusing $mpz_ret.

After adding to Math::Int128 a new set of operators that use a preallocated argument for output, Math::Int128 becomes faster than Math::GMPz, around 60% faster.

The modified benchmark script:

use Math::Int128 qw(int128 :op); use Math::GMPz qw(:mpz); use Benchmark qw(:all); $count = 40000; $mpz1 = Math::GMPz->new('676469752303423489'); $mpz2 = Math::GMPz->new('776469752999423489'); $i_1 = int128("$mpz1"); $i_2 = int128("$mpz2"); $mpz_sub = Math::GMPz->new('976469752313423489'); $i_sub = int128("$mpz_sub"); $mpz_div = Math::GMPz->new('76469752313423489'); $i_div = int128("$mpz_div"); $mpz_ret = Rmpz_init2(128); $i_ret = int128(); use warnings; print " ****************** **MULTIPLICATION** ******************\n\n"; cmpthese(-1, { 'mul_M::I' => '$ri = Math::Int128::_mul($i_1, $i_2, 0)', 'mul_M::I2'=> 'int128_mul($i_ret, $i_1, $i_2)', 'mul_M::G1'=> '$mpz_ret = $mpz1 * $mpz2', 'mul_M::G2'=> 'Rmpz_mul($mpz_ret, $mpz1, $mpz2)', }); die "Error 1:\n$ri\n$mpz_ret\n$i_ret\n" if $ri != int128("$mpz_ret") || $ri != int128('525258301482620425304858018020933121') || $ri != $i +_ret; $i_1 *= $i_1; $i_2 *= $i_2; $mpz1 *= $mpz1; $mpz2 *= $mpz2; # print "i_1: $i_1, i_2: $i_2\n"; print " ****************** *****DIVISION***** ******************\n\n"; cmpthese(-1, { 'div_M::I' => '$ri = Math::Int128::_div($i_1, $i_div, 0)', 'div_M::I2'=> 'int128_div($i_ret, $i_1, $i_div)', 'div_M::G1'=>'$mpz_ret = $mpz1 / $mpz_div', 'div_M::G2'=> 'Rmpz_tdiv_q($mpz_ret, $mpz1, $mpz_div)', }); die "Error 2:\n$ri\n$mpz_ret\n$i_ret\n" if $ri != int128("$mpz_ret") || $ri != int128('5984213521522366751') || $ri != $i_ret; print" ****************** *****ADDITION***** ******************\n\n"; cmpthese(-1, { 'add_M::I' => '$ri = Math::Int128::_add($i_1, $i_2, 0)', 'add_M::I2' => 'int128_add($i_ret, $i_1, $i_2)', 'add_M::G1' => '$mpz_ret = $mpz1 + $mpz2', 'add_M::G2' => 'Rmpz_add($mpz_ret, $mpz1, $mpz2)', }); die "Error 3:\n$ri\n$mpz_ret\n$i_ret\n" if $ri != int128("$mpz_ret") || $ri != int128('1060516603104440851094132036041866242') || $ri != $ +i_ret; print " ****************** ****SUBTRACTION*** ******************\n\n"; cmpthese(-1, { 'sub_M::I' => '$ri = Math::Int128::_sub($i_1, $i_sub, 0)', 'sub_M::I2' => 'int128_sub($i_ret, $i_1, $i_sub)', 'sub_M::G1' => '$mpz_ret = $mpz1 - $mpz_sub', 'sub_M::G2' => 'Rmpz_sub($mpz_ret, $mpz1, $mpz_sub)', }); die "Error 4:\n$ri\n$mpz_ret\n$i_ret\n" if $ri != int128("$mpz_ret") || $ri != int128('457611325781455127825205517363509632') || $ri != $i +_ret;

And the results I get on my 64bits-linux-but-with-a-not-very-optimized-for-64bits-old-processor:

****************** **MULTIPLICATION** ****************** Rate mul_M::G1 mul_M::I mul_M::G2 mul_M::I2 mul_M::G1 321555/s -- -72% -87% -91% mul_M::I 1147836/s 257% -- -52% -69% mul_M::G2 2406041/s 648% 110% -- -35% mul_M::I2 3709585/s 1054% 223% 54% -- ****************** *****DIVISION***** ****************** Rate div_M::G1 div_M::I div_M::G2 div_M::I2 div_M::G1 314139/s -- -71% -83% -88% div_M::I 1092266/s 248% -- -39% -59% div_M::G2 1799026/s 473% 65% -- -32% div_M::I2 2633983/s 738% 141% 46% -- ****************** *****ADDITION***** ****************** Rate add_M::G1 add_M::I add_M::G2 add_M::I2 add_M::G1 317021/s -- -71% -86% -91% add_M::I 1092266/s 245% -- -51% -70% add_M::G2 2248783/s 609% 106% -- -38% add_M::I2 3598054/s 1035% 229% 60% -- ****************** ****SUBTRACTION*** ****************** Rate sub_M::G1 sub_M::I sub_M::G2 sub_M::I2 sub_M::G1 309967/s -- -72% -84% -91% sub_M::I 1113475/s 259% -- -44% -68% sub_M::G2 1997468/s 544% 79% -- -43% sub_M::I2 3495253/s 1028% 214% 75% --
The new version of the module can be obtained from GitHub.

Replies are listed 'Best First'.
Re^14: Module for 128-bit integer math?
by syphilis (Archbishop) on Feb 15, 2011 at 11:14 UTC
    Math::Int128 becomes faster than Math::GMPz, around 60%

    Using the latest version of Math::Int128, and your modified script, I find an improvement (on Windows Vista) of around 40%. Given that we're using different operating systems and probably different processors, I think we can agree that "I find the same as you".

    Cheers,
    Rob

    I should add that even 40% is better than I could get with my approach to Math::Int128 modifications. I might learn something if I ever find the time and energy to discover why that was so. (Best I could get was to have the int128 arithemtic about 5-10% faster than Math::GMPz.)

      Well, I have been able to improve Math::Int128 significantly since yesterday. Now, on my computer, it is twice as fast as Math::GMPz running your benchmarks.

      There were two kinds of improvements: providing the 3-argument operators (i.e. uint128_add($to, $a, $b)) and optimizing the SV to int128_t conversions.

      I have also found that the code generated by GCC-current is quite unpredictable. Sometimes things that should be faster are slower. So the tunning I have carried out may be ineffective for your particular version of GCC.

        There were two kinds of improvements: providing the 3-argument operators (i.e. uint128_add($to, $a, $b)) and optimizing the SV to int128_t conversions.

        I think I got the first part handled ok - but I haven't given consideration to the second.
        For the record, my "rough estimation script" (for multiplication only) is as follows - in accordance with the usual mantra that I follow when creating objects:
        package Sis::UInt128; use warnings; use strict; use Math::GMPz qw(:mpz); use Benchmark qw(:all); use Inline C => Config => BUILD_NOISY => 1, TYPEMAPS => ['./typemap_128'], USING => 'ParseRegExp'; use Inline C => <<'EOC'; SV * create() { __uint128 *ps; SV * obj_ref, * obj; New(42, ps, 1, __uint128); if(ps == NULL) croak("Failed to allocate memory in create functio +n"); obj_ref = newSV(0); obj = newSVrv(obj_ref, "Sis::UInt128"); sv_setiv(obj, INT2PTR(IV,ps)); SvREADONLY_on(obj); return obj_ref; } void _assign(__uint128 * rop, SV * h, SV * l) { __uint128 __div = 9223372036854775808ULL; unsigned __int64 high, low; high = (unsigned __int64)SvUV(h); low = (unsigned __int64)SvUV(l); *rop = ((__uint128)__div * high) + (__uint128)low; } void _deref_obj(__uint128 * obj) { dXSARGS; __uint128 __div = 9223372036854775808ULL; ST(0) = sv_2mortal(newSVuv(*obj / __div)); ST(1) = sv_2mortal(newSVuv(*obj % __div)); XSRETURN(2); } void DESTROY(__uint128 * obj) { printf("Cleaning up\n"); Safefree(obj); } void mul_128(__uint128 * rop, __uint128 * op1, __uint128 * op2) { *rop = *op1 * *op2; } EOC our $mpz1 = Math::GMPz->new('676469752303423489'); our $mpz2 = Math::GMPz->new('776469752999423489'); our $mpz_ret = Math::GMPz::Rmpz_init2(128); our $i_ret = create(); our $i_1 = create(); our $i_2 = create(); assign($i_1, "$mpz1"); assign($i_2, "$mpz2"); our $count = 50000; timethese($count * 19, { 'mul_128' => 'mul_128($i_ret, $i_1, $i_2)', 'gmpz' => 'Math::GMPz::Rmpz_mul($mpz_ret, $mpz1, $mpz2)', }); print retrieve($i_1),"\n", retrieve($i_2),"\n", retrieve($i_ret), "\n" +; print "$mpz1\n$mpz2\n$mpz_ret\n"; sub assign { my $obj = shift; my $num = shift; my @args; my $a0 = Math::GMPz->new($num) / 9223372036854775808; my $a1 = Math::GMPz->new($num) % 9223372036854775808; $args[0] = "$a0" + 0; $args[1] = "$a1" + 0; _assign($obj, $args[0], $args[1]); } sub retrieve { my @in = _deref_obj($_[0]); my $num = Math::GMPz->new($in[0]); $num *= 9223372036854775808; $num += $in[1]; return "$num"; }
        with a "typemap_128" that looks like this:
        __uint128 * UI128 INPUT UI128 $var = INT2PTR($type, SvIV(SvRV($arg)))
        Cheers,
        Rob
Re^14: Module for 128-bit integer math?
by syphilis (Archbishop) on Feb 14, 2011 at 20:53 UTC
    After adding to Math::Int128 a new set of operators that use a preallocated argument for output, Math::Int128 becomes faster than Math::GMPz, around 60% faster

    That's more like the figures I anticipated (at a guess).
    My "I don't see much scope for improvement" comment was in relation to my own enhancements to Math::Int128 which removed the allocating/deallocating you mentioned - and which made the int128 multiplication a little faster than using Math::GMPz's functions (and, of course, significantly faster than using Math::GMPz's overloaded operators).

    When I get back to my Vista64 box, I'll grab the latest github version and see for myself ... and, of course, post again with my results for that machine.

    Cheers,
    Rob

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://887971]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2024-04-26 05:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found