Well, I have been able to improve Math::Int128 significantly since yesterday. Now, on my computer, it is twice as fast as Math::GMPz running your benchmarks.
There were two kinds of improvements: providing the 3-argument operators (i.e. uint128_add($to, $a, $b)) and optimizing the SV to int128_t conversions.
I have also found that the code generated by GCC-current is quite unpredictable. Sometimes things that should be faster are slower. So the tunning I have carried out may be ineffective for your particular version of GCC.