Re^13: Module for 128-bit integer math?

in reply to Re^12: Module for 128-bit integer math?
in thread Module for 128-bit integer math?

but I don't see much scope for improvement at the moment

Well, there is...

The G2 version is faster than G1 and I because it is not allocating and deallocating the result object every time but reusing $mpz_ret.

After adding to Math::Int128 a new set of operators that use a preallocated argument for output, Math::Int128 becomes faster than Math::GMPz, around 60% faster.

The modified benchmark script:


use Math::Int128 qw(int128 :op);
use Math::GMPz qw(:mpz);
use Benchmark qw(:all);

$count = 40000;

$mpz1  = Math::GMPz->new('676469752303423489');
$mpz2  = Math::GMPz->new('776469752999423489');
$i_1   = int128("$mpz1");
$i_2   = int128("$mpz2");

$mpz_sub = Math::GMPz->new('976469752313423489');
$i_sub   = int128("$mpz_sub");

$mpz_div = Math::GMPz->new('76469752313423489');
$i_div   = int128("$mpz_div");

$mpz_ret = Rmpz_init2(128);
$i_ret = int128();

use warnings;

print "
******************
**MULTIPLICATION**
******************\n\n";

cmpthese(-1, {
    'mul_M::I' => '$ri = Math::Int128::_mul($i_1, $i_2, 0)',
    'mul_M::I2'=> 'int128_mul($i_ret, $i_1, $i_2)',
    'mul_M::G1'=> '$mpz_ret = $mpz1 * $mpz2',
    'mul_M::G2'=> 'Rmpz_mul($mpz_ret, $mpz1, $mpz2)',
});

die "Error 1:\n$ri\n$mpz_ret\n$i_ret\n" if $ri != int128("$mpz_ret")
 || $ri != int128('525258301482620425304858018020933121') || $ri != $i
+_ret;


$i_1 *= $i_1;
$i_2 *= $i_2;
$mpz1 *= $mpz1;
$mpz2 *= $mpz2;

# print "i_1: $i_1, i_2: $i_2\n";


print "
******************
*****DIVISION*****
******************\n\n";

cmpthese(-1, {
    'div_M::I' => '$ri = Math::Int128::_div($i_1, $i_div, 0)',
    'div_M::I2'=> 'int128_div($i_ret, $i_1, $i_div)',
    'div_M::G1'=>'$mpz_ret = $mpz1 / $mpz_div',
    'div_M::G2'=> 'Rmpz_tdiv_q($mpz_ret, $mpz1, $mpz_div)',
});

die "Error 2:\n$ri\n$mpz_ret\n$i_ret\n" if $ri != int128("$mpz_ret")
 || $ri != int128('5984213521522366751') || $ri != $i_ret;

print"
******************
*****ADDITION*****
******************\n\n";

cmpthese(-1, {
    'add_M::I'  => '$ri = Math::Int128::_add($i_1, $i_2, 0)',
    'add_M::I2' => 'int128_add($i_ret, $i_1, $i_2)',
    'add_M::G1' => '$mpz_ret = $mpz1  + $mpz2',
    'add_M::G2' => 'Rmpz_add($mpz_ret, $mpz1, $mpz2)',
});

die "Error 3:\n$ri\n$mpz_ret\n$i_ret\n" if $ri != int128("$mpz_ret") 
 || $ri != int128('1060516603104440851094132036041866242') || $ri != $
+i_ret;

print "
******************
****SUBTRACTION***
******************\n\n";

cmpthese(-1, {
    'sub_M::I'  => '$ri = Math::Int128::_sub($i_1, $i_sub, 0)',
    'sub_M::I2' => 'int128_sub($i_ret, $i_1, $i_sub)',
    'sub_M::G1' => '$mpz_ret = $mpz1 - $mpz_sub',
    'sub_M::G2' => 'Rmpz_sub($mpz_ret, $mpz1, $mpz_sub)',
});

die "Error 4:\n$ri\n$mpz_ret\n$i_ret\n" if $ri != int128("$mpz_ret") 
 || $ri != int128('457611325781455127825205517363509632') || $ri != $i
+_ret;
[download]

And the results I get on my 64bits-linux-but-with-a-not-very-optimized-for-64bits-old-processor:

******************
**MULTIPLICATION**
******************

               Rate mul_M::G1  mul_M::I mul_M::G2 mul_M::I2
mul_M::G1  321555/s        --      -72%      -87%      -91%
mul_M::I  1147836/s      257%        --      -52%      -69%
mul_M::G2 2406041/s      648%      110%        --      -35%
mul_M::I2 3709585/s     1054%      223%       54%        --

******************
*****DIVISION*****
******************

               Rate div_M::G1  div_M::I div_M::G2 div_M::I2
div_M::G1  314139/s        --      -71%      -83%      -88%
div_M::I  1092266/s      248%        --      -39%      -59%
div_M::G2 1799026/s      473%       65%        --      -32%
div_M::I2 2633983/s      738%      141%       46%        --

******************
*****ADDITION*****
******************

               Rate add_M::G1  add_M::I add_M::G2 add_M::I2
add_M::G1  317021/s        --      -71%      -86%      -91%
add_M::I  1092266/s      245%        --      -51%      -70%
add_M::G2 2248783/s      609%      106%        --      -38%
add_M::I2 3598054/s     1035%      229%       60%        --

******************
****SUBTRACTION***
******************

               Rate sub_M::G1  sub_M::I sub_M::G2 sub_M::I2
sub_M::G1  309967/s        --      -72%      -84%      -91%
sub_M::I  1113475/s      259%        --      -44%      -68%
sub_M::G2 1997468/s      544%       79%        --      -43%
sub_M::I2 3495253/s     1028%      214%       75%        --
[download]

The new version of the module can be obtained from GitHub.

Comment on Re^13: Module for 128-bit integer math? Select or Download Code

Replies are listed 'Best First'.
Re^14: Module for 128-bit integer math? by syphilis (Archbishop) on Feb 15, 2011 at 11:14 UTC
Math::Int128 becomes faster than Math::GMPz, around 60% Using the latest version of Math::Int128, and your modified script, I find an improvement (on Windows Vista) of around 40%. Given that we're using different operating systems and probably different processors, I think we can agree that "I find the same as you". Cheers, Rob I should add that even 40% is better than I could get with my approach to Math::Int128 modifications. I might learn something if I ever find the time and energy to discover why that was so. (Best I could get was to have the int128 arithemtic about 5-10% faster than Math::GMPz.)	[reply]
Re^15: Module for 128-bit integer math? by salva (Canon) on Feb 15, 2011 at 11:46 UTC
Well, I have been able to improve Math::Int128 significantly since yesterday. Now, on my computer, it is twice as fast as Math::GMPz running your benchmarks. There were two kinds of improvements: providing the 3-argument operators (i.e. `uint128_add($to, $a, $b)`) and optimizing the `SV` to `int128_t` conversions. I have also found that the code generated by GCC-current is quite unpredictable. Sometimes things that should be faster are slower. So the tunning I have carried out may be ineffective for your particular version of GCC.	[reply] [d/l] [select]
Re^16: Module for 128-bit integer math? by syphilis (Archbishop) on Feb 15, 2011 at 13:40 UTC
There were two kinds of improvements: providing the 3-argument operators (i.e. uint128_add($to, $a, $b)) and optimizing the SV to int128_t conversions. I think I got the first part handled ok - but I haven't given consideration to the second. For the record, my "rough estimation script" (for multiplication only) is as follows - in accordance with the usual mantra that I follow when creating objects: package Sis::UInt128; use warnings; use strict; use Math::GMPz qw(:mpz); use Benchmark qw(:all); use Inline C => Config => BUILD_NOISY => 1, TYPEMAPS => ['./typemap_128'], USING => 'ParseRegExp'; use Inline C => <<'EOC'; SV * create() { __uint128 ps; SV obj_ref, * obj; New(42, ps, 1, __uint128); if(ps == NULL) croak("Failed to allocate memory in create functio +n"); obj_ref = newSV(0); obj = newSVrv(obj_ref, "Sis::UInt128"); sv_setiv(obj, INT2PTR(IV,ps)); SvREADONLY_on(obj); return obj_ref; } void _assign(__uint128 * rop, SV * h, SV * l) { __uint128 __div = 9223372036854775808ULL; unsigned __int64 high, low; high = (unsigned __int64)SvUV(h); low = (unsigned __int64)SvUV(l); rop = ((__uint128)__div high) + (__uint128)low; } void _deref_obj(__uint128 * obj) { dXSARGS; __uint128 __div = 9223372036854775808ULL; ST(0) = sv_2mortal(newSVuv(obj / __div)); ST(1) = sv_2mortal(newSVuv(obj % __div)); XSRETURN(2); } void DESTROY(__uint128 * obj) { printf("Cleaning up\n"); Safefree(obj); } void mul_128(__uint128 * rop, __uint128 * op1, __uint128 * op2) { rop = op1 * op2; } EOC our $mpz1 = Math::GMPz->new('676469752303423489'); our $mpz2 = Math::GMPz->new('776469752999423489'); our $mpz_ret = Math::GMPz::Rmpz_init2(128); our $i_ret = create(); our $i_1 = create(); our $i_2 = create(); assign($i_1, "$mpz1"); assign($i_2, "$mpz2"); our $count = 50000; timethese($count 19, { 'mul_128' => 'mul_128($i_ret, $i_1, $i_2)', 'gmpz' => 'Math::GMPz::Rmpz_mul($mpz_ret, $mpz1, $mpz2)', }); print retrieve($i_1),"\n", retrieve($i_2),"\n", retrieve($i_ret), "\n" +; print "$mpz1\n$mpz2\n$mpz_ret\n"; sub assign { my $obj = shift; my $num = shift; my @args; my $a0 = Math::GMPz->new($num) / 9223372036854775808; my $a1 = Math::GMPz->new($num) % 9223372036854775808; $args[0] = "$a0" + 0; $args[1] = "$a1" + 0; _assign($obj, $args[0], $args[1]); } sub retrieve { my @in = _deref_obj($_[0]); my $num = Math::GMPz->new($in[0]); $num = 9223372036854775808; $num += $in[1]; return "$num"; } [download] with a "typemap_128" that looks like this: `__uint128 UI128 INPUT UI128 $var = INT2PTR($type, SvIV(SvRV($arg)))` [download] Cheers, Rob	[reply] [d/l] [select]
Re^14: Module for 128-bit integer math? by syphilis (Archbishop) on Feb 14, 2011 at 20:53 UTC
After adding to Math::Int128 a new set of operators that use a preallocated argument for output, Math::Int128 becomes faster than Math::GMPz, around 60% faster That's more like the figures I anticipated (at a guess). My "I don't see much scope for improvement" comment was in relation to my own enhancements to Math::Int128 which removed the allocating/deallocating you mentioned - and which made the int128 multiplication a little faster than using Math::GMPz's functions (and, of course, significantly faster than using Math::GMPz's overloaded operators). When I get back to my Vista64 box, I'll grab the latest github version and see for myself ... and, of course, post again with my results for that machine. Cheers, Rob	[reply]

In Section Seekers of Perl Wisdom