in reply to Re^12: Module for 128-bit integer math? in thread Module for 128-bit integer math?
but I don't see much scope for improvement at the moment
Well, there is...
The G2 version is faster than G1 and I because it is not allocating and deallocating the result object every time but reusing $mpz_ret.
After adding to Math::Int128 a new set of operators that use a preallocated argument for output, Math::Int128 becomes faster than Math::GMPz, around 60% faster.
The modified benchmark script:
use Math::Int128 qw(int128 :op);
use Math::GMPz qw(:mpz);
use Benchmark qw(:all);
$count = 40000;
$mpz1 = Math::GMPz->new('676469752303423489');
$mpz2 = Math::GMPz->new('776469752999423489');
$i_1 = int128("$mpz1");
$i_2 = int128("$mpz2");
$mpz_sub = Math::GMPz->new('976469752313423489');
$i_sub = int128("$mpz_sub");
$mpz_div = Math::GMPz->new('76469752313423489');
$i_div = int128("$mpz_div");
$mpz_ret = Rmpz_init2(128);
$i_ret = int128();
use warnings;
print "
******************
**MULTIPLICATION**
******************\n\n";
cmpthese(-1, {
'mul_M::I' => '$ri = Math::Int128::_mul($i_1, $i_2, 0)',
'mul_M::I2'=> 'int128_mul($i_ret, $i_1, $i_2)',
'mul_M::G1'=> '$mpz_ret = $mpz1 * $mpz2',
'mul_M::G2'=> 'Rmpz_mul($mpz_ret, $mpz1, $mpz2)',
});
die "Error 1:\n$ri\n$mpz_ret\n$i_ret\n" if $ri != int128("$mpz_ret")
|| $ri != int128('525258301482620425304858018020933121') || $ri != $i
+_ret;
$i_1 *= $i_1;
$i_2 *= $i_2;
$mpz1 *= $mpz1;
$mpz2 *= $mpz2;
# print "i_1: $i_1, i_2: $i_2\n";
print "
******************
*****DIVISION*****
******************\n\n";
cmpthese(-1, {
'div_M::I' => '$ri = Math::Int128::_div($i_1, $i_div, 0)',
'div_M::I2'=> 'int128_div($i_ret, $i_1, $i_div)',
'div_M::G1'=>'$mpz_ret = $mpz1 / $mpz_div',
'div_M::G2'=> 'Rmpz_tdiv_q($mpz_ret, $mpz1, $mpz_div)',
});
die "Error 2:\n$ri\n$mpz_ret\n$i_ret\n" if $ri != int128("$mpz_ret")
|| $ri != int128('5984213521522366751') || $ri != $i_ret;
print"
******************
*****ADDITION*****
******************\n\n";
cmpthese(-1, {
'add_M::I' => '$ri = Math::Int128::_add($i_1, $i_2, 0)',
'add_M::I2' => 'int128_add($i_ret, $i_1, $i_2)',
'add_M::G1' => '$mpz_ret = $mpz1 + $mpz2',
'add_M::G2' => 'Rmpz_add($mpz_ret, $mpz1, $mpz2)',
});
die "Error 3:\n$ri\n$mpz_ret\n$i_ret\n" if $ri != int128("$mpz_ret")
|| $ri != int128('1060516603104440851094132036041866242') || $ri != $
+i_ret;
print "
******************
****SUBTRACTION***
******************\n\n";
cmpthese(-1, {
'sub_M::I' => '$ri = Math::Int128::_sub($i_1, $i_sub, 0)',
'sub_M::I2' => 'int128_sub($i_ret, $i_1, $i_sub)',
'sub_M::G1' => '$mpz_ret = $mpz1 - $mpz_sub',
'sub_M::G2' => 'Rmpz_sub($mpz_ret, $mpz1, $mpz_sub)',
});
die "Error 4:\n$ri\n$mpz_ret\n$i_ret\n" if $ri != int128("$mpz_ret")
|| $ri != int128('457611325781455127825205517363509632') || $ri != $i
+_ret;
And the results I get on my 64bits-linux-but-with-a-not-very-optimized-for-64bits-old-processor:
******************
**MULTIPLICATION**
******************
Rate mul_M::G1 mul_M::I mul_M::G2 mul_M::I2
mul_M::G1 321555/s -- -72% -87% -91%
mul_M::I 1147836/s 257% -- -52% -69%
mul_M::G2 2406041/s 648% 110% -- -35%
mul_M::I2 3709585/s 1054% 223% 54% --
******************
*****DIVISION*****
******************
Rate div_M::G1 div_M::I div_M::G2 div_M::I2
div_M::G1 314139/s -- -71% -83% -88%
div_M::I 1092266/s 248% -- -39% -59%
div_M::G2 1799026/s 473% 65% -- -32%
div_M::I2 2633983/s 738% 141% 46% --
******************
*****ADDITION*****
******************
Rate add_M::G1 add_M::I add_M::G2 add_M::I2
add_M::G1 317021/s -- -71% -86% -91%
add_M::I 1092266/s 245% -- -51% -70%
add_M::G2 2248783/s 609% 106% -- -38%
add_M::I2 3598054/s 1035% 229% 60% --
******************
****SUBTRACTION***
******************
Rate sub_M::G1 sub_M::I sub_M::G2 sub_M::I2
sub_M::G1 309967/s -- -72% -84% -91%
sub_M::I 1113475/s 259% -- -44% -68%
sub_M::G2 1997468/s 544% 79% -- -43%
sub_M::I2 3495253/s 1028% 214% 75% --
The new version of the module can be obtained from GitHub.
Re^14: Module for 128-bit integer math?
by syphilis (Archbishop) on Feb 15, 2011 at 11:14 UTC
|
Math::Int128 becomes faster than Math::GMPz, around 60%
Using the latest version of Math::Int128, and your modified script, I find an improvement (on Windows Vista) of around 40%. Given that we're using different operating systems and probably different processors, I think we can agree that "I find the same as you".
Cheers, Rob
I should add that even 40% is better than I could get with my approach to Math::Int128 modifications. I might learn something if I ever find the time and energy to discover why that was so. (Best I could get was to have the int128 arithemtic about 5-10% faster than Math::GMPz.) | [reply] |
|
Well, I have been able to improve Math::Int128 significantly since yesterday. Now, on my computer, it is twice as fast as Math::GMPz running your benchmarks.
There were two kinds of improvements: providing the 3-argument operators (i.e. uint128_add($to, $a, $b)) and optimizing the SV to int128_t conversions.
I have also found that the code generated by GCC-current is quite unpredictable. Sometimes things that should be faster are slower. So the tunning I have carried out may be ineffective for your particular version of GCC.
| [reply] [d/l] [select] |
|
There were two kinds of improvements: providing the 3-argument operators (i.e. uint128_add($to, $a, $b)) and optimizing the SV to int128_t conversions.
I think I got the first part handled ok - but I haven't given consideration to the second. For the record, my "rough estimation script" (for multiplication only) is as follows - in accordance with the usual mantra that I follow when creating objects:
package Sis::UInt128;
use warnings;
use strict;
use Math::GMPz qw(:mpz);
use Benchmark qw(:all);
use Inline C => Config =>
BUILD_NOISY => 1,
TYPEMAPS => ['./typemap_128'],
USING => 'ParseRegExp';
use Inline C => <<'EOC';
SV * create() {
__uint128 *ps;
SV * obj_ref, * obj;
New(42, ps, 1, __uint128);
if(ps == NULL) croak("Failed to allocate memory in create functio
+n");
obj_ref = newSV(0);
obj = newSVrv(obj_ref, "Sis::UInt128");
sv_setiv(obj, INT2PTR(IV,ps));
SvREADONLY_on(obj);
return obj_ref;
}
void _assign(__uint128 * rop, SV * h, SV * l) {
__uint128 __div = 9223372036854775808ULL;
unsigned __int64 high, low;
high = (unsigned __int64)SvUV(h);
low = (unsigned __int64)SvUV(l);
*rop = ((__uint128)__div * high) + (__uint128)low;
}
void _deref_obj(__uint128 * obj) {
dXSARGS;
__uint128 __div = 9223372036854775808ULL;
ST(0) = sv_2mortal(newSVuv(*obj / __div));
ST(1) = sv_2mortal(newSVuv(*obj % __div));
XSRETURN(2);
}
void DESTROY(__uint128 * obj) {
printf("Cleaning up\n");
Safefree(obj);
}
void mul_128(__uint128 * rop, __uint128 * op1, __uint128 * op2) {
*rop = *op1 * *op2;
}
EOC
our $mpz1 = Math::GMPz->new('676469752303423489');
our $mpz2 = Math::GMPz->new('776469752999423489');
our $mpz_ret = Math::GMPz::Rmpz_init2(128);
our $i_ret = create();
our $i_1 = create();
our $i_2 = create();
assign($i_1, "$mpz1");
assign($i_2, "$mpz2");
our $count = 50000;
timethese($count * 19, {
'mul_128' => 'mul_128($i_ret, $i_1, $i_2)',
'gmpz' => 'Math::GMPz::Rmpz_mul($mpz_ret, $mpz1, $mpz2)',
});
print retrieve($i_1),"\n", retrieve($i_2),"\n", retrieve($i_ret), "\n"
+;
print "$mpz1\n$mpz2\n$mpz_ret\n";
sub assign {
my $obj = shift;
my $num = shift;
my @args;
my $a0 = Math::GMPz->new($num) / 9223372036854775808;
my $a1 = Math::GMPz->new($num) % 9223372036854775808;
$args[0] = "$a0" + 0;
$args[1] = "$a1" + 0;
_assign($obj, $args[0], $args[1]);
}
sub retrieve {
my @in = _deref_obj($_[0]);
my $num = Math::GMPz->new($in[0]);
$num *= 9223372036854775808;
$num += $in[1];
return "$num";
}
with a "typemap_128" that looks like this:
__uint128 * UI128
INPUT
UI128
$var = INT2PTR($type, SvIV(SvRV($arg)))
Cheers, Rob
| [reply] [d/l] [select] |
Re^14: Module for 128-bit integer math?
by syphilis (Archbishop) on Feb 14, 2011 at 20:53 UTC
|
After adding to Math::Int128 a new set of operators that use a preallocated argument for output, Math::Int128 becomes faster than Math::GMPz, around 60% faster
That's more like the figures I anticipated (at a guess). My "I don't see much scope for improvement" comment was in relation to my own enhancements to Math::Int128 which removed the allocating/deallocating you mentioned - and which made the int128 multiplication a little faster than using Math::GMPz's functions (and, of course, significantly faster than using Math::GMPz's overloaded operators).
When I get back to my Vista64 box, I'll grab the latest github version and see for myself ... and, of course, post again with my results for that machine.
Cheers, Rob | [reply] |
|
|