Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Re: porting C code to Perl

by danaj (Friar)
on Oct 25, 2017 at 02:02 UTC ( [id://1201987] : note . print w/replies, xml ) Need Help??

in reply to porting C code to Perl

I should make a blog post about this, but here are some thoughts.

The C code you started with is really poor quality, seemingly because it got involved with code golfing. Global variables, sigh -- that was ugly in the 1980s and still is. This makes it tough to follow and translate.

The rest of this is really all about performance. If you're happy using it, then great, the rest of this is just blathering on about details. But ever since as a kid I bought Petr Beckmann's book I've found this interesting. It was also one of my many digressions -- I needed a few million digits of Pi for a puzzle and was disappointed that the standard Perl modules were horribly slow, so went down the rat hole.

The spigot code you're using is one of the slower methods for generating N digits of Pi. There is a modified version of the simple C spigot from Winter, Flammenkamp, et al.) that runs about 8 times faster (code in ntheory, Luschny's Java version, many other variants on the web). I added a little bit to make sure it rounds correctly at reasonable sizes (the table maker's dilemma still applies, but not for quite some time). I actually only use it for <= 100 digits or if MPFR and GMP are not available, as those methods are far faster.

You can see a timing comparison of some C methods I measured in 2014 on RosettaCode. Spigot 1 in that table is Winter/Flammenkamp version (as mentioned, about 8x faster than the one you're using). Here is a (hastily assembled) chart of 13 algorithm/implementations I made today, all but one used from Perl. The time (in seconds) on the y-axis uses a log scale which lets us see how AGM and Chudnovsky have a favorably slope compared to the others (about 2.4x time per doubling of digits vs. 4x).

Pi timing graph

  • y-cruncher is what you want if you really need a few billion digits for some project. It's pretty cool if you're interested in this stuff.
  • Pari's Chudnovsky method (they call it Ramanujan + binary splitting) is definitely faster than AGM. This is basically the same method as Xue's Chudnovsky program shown on the GMP site, though implemented differently. That's the one method that was called using GP rather than plugged into Perl. Putting a GMP version of this in my module has been on my to-do list for a couple years.
  • My AGM implementation in GMP (used in Math::Prime::Util aka ntheory when possible) is just a fraction slower than MPFR (I use the normal GMP mpf interface, they use MPFR of course). It's short code, quite fast, and asymptotically looks nice.
  • The Machin formulas implemented in GMP are inferior to AGM or Chudnovsky at all sizes and get worse. Using updated formulas helps only slightly. They still easily beat the spigot methods and all but one of the Perl methods. Math::BigFloat used to use these (but in Perl not C+GMP).
  • The spigot algorithms don't compete well for this task (producing N correctly rounded digits of Pi). There is no point in using GMP for these particular ones. Implementing it in Perl directly has a very large performance hit, as all these operations are really simple so C is going to shine.
  • Math::BigFloat changed from Machin to AGM in version 1.999705 (2015). rt://90033 has some performance numbers. With the GMP back end it isn't bad. The overhead at small sizes isn't ideal. With the Pari or Calc back ends it's terribly slow (assuming you want 1000+ digits).
  • I suspect if you used Math::GMPf or Math::AnyNum and wrote the simple AGM code it would be only a small penalty from the GMP code, so easily beat all the other Perl code shown. At that point you could just use the pi() function from Math::AnyNum which calls MPFR, or Pi() from ntheory which uses its best available choice from MPFR, GMP, C, or pure Perl.

Replies are listed 'Best First'.
Re^2: porting C code to Perl
by marioroy (Prior) on Oct 25, 2017 at 09:23 UTC

    Update: Running ./configure is all one needs. GMP builds fine on the Mac and sets MPN_PATH properly. The custom script isn't necessary.

    Update: Ditto for MPFR. Running ./configure --with-gmp=/usr/local is all one needs. Please forget the custom script.

    Update: It requires sudo to install the libraries under /usr/local.

    Hello danaj and like Discipulus am astonished too.

    This post describes how I build the GMP and MPFR libraries on my Mac (Haswell architecture), beneficial for ntheory. I have found that setting MPN_PATH is helpful. The -march=core2 -mtune=intel compiler options enable extra performance as well, particularly -mtune=intel.

    On my CentOS 7.3 VM, I just run sudo yum install gmp-devel mpfr-devel before installing Math::Prime::Util.


    Below, "" to configure the GMP library. Remove the architecture(s) not supported by your processor (MPN_64). Otherwise, make check will fail.

    PREFIX="/usr/local" OPT_FLAGS="-O3 -march=core2 -mtune=intel -fomit-frame-pointer" OS_ARCH="`uname -m`" OS_REV="`uname -r | cut -d. -f1`" TRIPLE="${OS_ARCH}-apple-darwin${OS_REV}" export CXXFLAGS="${OPT_FLAGS} -pipe -Wall -fno-common -DPIC" export CFLAGS="${OPT_FLAGS} -pipe -Wall -fno-common -DPIC" export LDFLAGS="-Wl,-no_pie" export CXX=g++ export CC=gcc # MPN_64="x86_64/coreibwl x86_64/coreihwl x86_64/coreisbr x86_64/corei +nhm x86_64/fastsse x86_64/core2 x86_64 generic" MPN_64="x86_64/coreihwl x86_64/coreisbr x86_64/coreinhm x86_64/fastsse + x86_64/core2 x86_64 generic" # Replace two options with --disable-static --enable-shared for shared + libraries MPN_PATH="${MPN_64}" \ ./configure \ --build="${TRIPLE}" --prefix="${PREFIX}" --disable-shared --enable- +static \ --enable-cxx --enable-assembly --enable-fft

    Build instructions.

    cd gmp-6.1.2 sh /path/to/ make -j 4 make check # <-- important, do not skip sudo make install-strip


    Below, "" to configure the MPFR library.

    PREFIX="/usr/local" OPT_FLAGS="-O3 -march=core2 -mtune=intel -fomit-frame-pointer" OS_ARCH="`uname -m`" OS_REV="`uname -r | cut -d. -f1`" TRIPLE="${OS_ARCH}-apple-darwin${OS_REV}" export CXXFLAGS="${OPT_FLAGS} -pipe -Wall -fno-common -DPIC" export CFLAGS="${OPT_FLAGS} -pipe -Wall -fno-common -DPIC" export LDFLAGS="-Wl,-no_pie" export CXX=g++ export CC=gcc # Replace two options with --disable-static --enable-shared for shared + libraries ./configure \ --build="${TRIPLE}" --prefix="${PREFIX}" --disable-shared --enable- +static \ --disable-dependency-tracking --disable-assert --enable-thread-safe + \ --with-gmp="${PREFIX}"

    Build instructions. Download mpfr-3.1.2 and allpatches (important).

    cd mpfr-3.1.2 patch -N -Z -p1 < /path/to/allpatches.txt sh /path/to/ make -j 4 make check # <-- important, do not skip sudo make install-strip


    Now that the two libraries are installed, (re)install Math::Prime::Util using your favorite method. It will pick dependencies such as Math::Prime::Util::GMP and Math::BigInt::GMP. Without GMP/MPFR, it takes 14 seconds to compute Pi to 50,000 digits. With GMP/MPFR, it takes 1.5 seconds to compute Pi to 1 million digits.

    $ time /opt/perl-5.26.1/bin/perl -Mntheory=:all -E 'say Pi(1e6)' | wc +-c 1000002 real 0m1.428s user 0m1.408s sys 0m0.017s

    No matter the Unix OS, ensure the GMP and MPFR development libraries are installed before installing Math::Prime::Util. Most OS vendors have packages already made.

    Regards, Mario

      GMP is installed on many systems now, partly because GCC started requiring it. MPFR is much less common. GMP is a big help for lots of things, but MPFR much less so, and I used it via Math::MPFR so you'd need that installed also. It looks like it is used in prime_count_lower and prime_count_upper for large enough values, and the floating point routines harmreal, ExponentialIntegral, LogarithmicIntegral, RiemannZeta, RiemannR, and Pi. For Pi it's a very small savings vs.the standard GMP back end. But the floating point things like RiemannZeta are just hard to do in Perl or GMP's rudimentary mpf layer (for me at least). Using Math::MPFR (thanks syphilis!) is thousands of times faster than Math::BigFloat even using the fast GMP backend. The MPFR implementors are very sharp and also better practicing numerical analysts than me.

      For AWS EC2 machines I usually just install the packages with apt-get or yum. I also do git and clone/build/install the latest versions of my modules but that's because I want the absolute latest. cpan is fine and as long as you have GMP you should just be able to cpan Math::Prime::Util Math::BigInt::GMP. But make sure it did find GMP (try cpan Math::Prime::Util::GMP to check).

      14 seconds for Pi to 50k digits... Ouch. Moar fastar! At least it's faster than the default Math::BigFloat (which doesn't have the luxury of dropping into custom C code).

      As Discipulus says I should try use integer as well to test the Perl spigot. I usually don't like that pragma because it's signed which is never what I want, but that's not relevent here. Now I'm pondering a digression into how to implement use uinteger. At first glance it doesn't look terrible hard to do in the core, but using a whole new hint bit is unlikely to fly. Maybe a test just to see if it would work...

        GMP's rudimentary mpf layer

        I think "rudimentary" is a fair description, and the GMP's mpf layer documentation even states that "new projects should consider using the GMP extension library MPFR ( instead".
        I personally think the same advice should be given wrt "old projects", too.

        One limitation of the mpf layer is that the allocated precision is always a multiple of limb size - where limb size will commonly be either 32 or 64 bits.
        Another limitation is the absence of both Inf and NaN.
        Also, the rounding rules are rather odd.
        The mpf_get_d function which converts the mpf_t to a double (which it then returns) rounds towards zero (ie truncates) - and that seems to be the rounding rule generally used throughout.
        But if one assigns a higher precision mpf_t to a lower precision mpf_t one doesn't always get what one expects.

        The following demo has two 128-bit mpf_t variables ($large_1 and $large_2) that have been assigned slightly different values.
        However, using any sane rounding rules, both of those values should round to the same 64-bit value - as both values have the first 65 bits set && there are other set bits further along.
        To test this, I assign $large_1 to the 64-bit $little_1, and I assign $large_2 to the 64-bit $little_2, expecting that $little_1 == $little_2. But Rmpf_cmp() reports that $little_1 != $little_2.
        And Rmpf_get_str() also shows that $little_1 != $little_2.
        Rmpf_out_str() outputs identical base 2 representations for $little_1 and $little_2 but displays 65 set bits ... which seems rather strange for a 64-bit precision mpf_t:
        use strict; use warnings; use Math::GMPf qw(:mpf); my $large_1 = Rmpf_init2(128); my $large_2 = Rmpf_init2(128); my $little_1 = Rmpf_init2(64); my $little_2 = Rmpf_init2(64); Rmpf_set_str($large_1, ('1' x 65) . ('0' x 2) . ('1' x 61), 2); Rmpf_set_str($large_2, ('1' x 65) . ('0' x 42) . ('1' x 21), 2); Rmpf_set($little_1, $large_1); Rmpf_set($little_2, $large_2); if(Rmpf_cmp($little_1, $little_2)) {print "WTF\n"} print $little_1, "\n", $little_2, "\n"; Rmpf_out_str($little_1, 2, 0); print "\n"; Rmpf_out_str($little_2, 2, 0); print "\n"; print Rmpf_get_prec($little_1), "\n"; print Rmpf_get_prec($little_2), "\n"; __END__ OUTPUTS: WTF 0.340282366920938463456e39 0.340282366920938463454e39 0.11111111111111111111111111111111111111111111111111111111111111111e12 +8 0.11111111111111111111111111111111111111111111111111111111111111111e12 +8 64 64
        These are not Math::GMPf bugs, because the following C script outputs the same results (though the format is not identical).
        #include <stdio.h> #include <gmp.h> int main(void) { mp_exp_t exponent; char *out_1, *out_2; mpf_t large_1, large_2, little_1, little_2; char *str_1 = "1111111111111111111111111111111111111111111111111111111111111111100 +1111111111111111111111111111111111111111111111111111111111111"; char *str_2 = "1111111111111111111111111111111111111111111111111111111111111111100 +0000000000000000000000000000000000000000111111111111111111111"; mpf_init2(large_1, 128); mpf_init2(large_2, 128); mpf_init2(little_1, 64); mpf_init2(little_2, 64); mpf_set_str(large_1, str_1, 2); mpf_set_str(large_2, str_2, 2); mpf_set(little_1, large_1); mpf_set(little_2, large_2); if(mpf_cmp(little_1, little_2)) printf("WTF\n"); out_1 = mpf_get_str(NULL, &exponent, 10, 0, little_1); printf("%s %d\n", out_1, exponent); out_2 = mpf_get_str(NULL, &exponent, 10, 0, little_2); printf("%s %d\n", out_2, exponent); mpf_out_str(stdout, 2, 0, little_1); printf("\n"); mpf_out_str(stdout, 2, 0, little_2); printf("\n"); printf("%d\n%d\n", mpf_get_prec(little_1), mpf_get_prec(little_2)); return 0; } /* OUTPUTS: WTF 340282366920938463456 39 340282366920938463454 39 0.11111111111111111111111111111111111111111111111111111111111111111e12 +8 0.11111111111111111111111111111111111111111111111111111111111111111e12 +8 64 64 */
        The mpf documentation does state:

        Internally, GMP sometimes calculates with higher precision than that of the destination variable in order to limit errors. Final results are always truncated to the destination variable’s precision.

        I suspect that this might account for the above anomalies - though I don't see any obvious connection.
        One avoids these types of anomalies with the MPFR library.
        Doing the same exercise with MPFR, one finds that $little_1 == $little_2. And that value is 3.40282366920938463463e38 (rounding to nearest), and 3.40282366920938463445e38 (rounding down).


        Out of curiosity, I looked at perllocal.pod. It looks like Math::MPFR installed automatically during the Math::Prime::Util cpan installation.

        $ vi /opt/perl-5.26.1/lib/5.26.1/darwin-thread-multi-2level/perllocal. +pod Math::Prime::Util::GMP Math::Prime::Util -- auto-installed Math::MPFR -- auto-installed Math::BigInt::GMP

        Regarding use integer (thanks haukex), the spigot Perl code ran 1.36 times faster on my Mac. Math::Prime::Util is already incredibly fast. Now wonder in amazement on what you will do next.

      Update: Added before/after results for Pi(1e9) to emphasize the performance increase.

      I did an experiment on an i7-Broadwell machine running CentOS 7.3. The system gmp-devel libraries were built to support the least common denominator. Thus, leaving out optimizations for newer CPU instructions.

      # default package build MPN_PATH=" x86_64/k8 x86_64 generic" # manually on the broadwell machine, includes extra optimizations MPN_PATH=" x86_64/coreibwl x86_64/coreihwl x86_64/coreisbr x86_64/core +inhm x86_64/core2 x86_64 generic"

      I have everything point to the /opt/perl-5.26.1 folder to not impact the system in any way. Then, re-installed GMP modules including Math::Prime::Util.

      cd gmp-6.1.2 ./configure --prefix=/opt/perl-5.26.1 make -j 4 make check make install cd Math-GMP-2.15 /opt/perl-5.26.1/bin/perl Makefile.PL INC="-I/opt/perl-5.26.1/include" + LIBS="-L/opt/perl-5.26.1/lib -lgmp" make make test make install cd Math-BigInt-GMP-1.6004 /opt/perl-5.26.1/bin/perl Makefile.PL INC="-I/opt/perl-5.26.1/include" + LIBS="-L/opt/perl-5.26.1/lib -lgmp" make make test make install cd Math-Prime-Util-GMP-0.46 /opt/perl-5.26.1/bin/perl Makefile.PL INC="-I/opt/perl-5.26.1/include" + LIBS="-L/opt/perl-5.26.1/lib -lgmp" make make test make install cd Math-Prime-Util-master /opt/perl-5.26.1/bin/perl Makefile.PL INC="-I/opt/perl-5.26.1/include" + LIBS="-L/opt/perl-5.26.1/lib -lgmp" make make test make install

      Beyond 20% gain is possible by simply building GMP manually. That will set MPN_PATH to utilize the full capabilities of the processor.

      time /opt/perl-5.26.1/bin/perl -Mntheory=:all -E 'say Pi(1e6)' | wc -c 1000002 0m 1.524s using system gmp libraries 0m 1.168s using gmp libraries from /opt/perl-5.26.1/lib time /opt/perl-5.26.1/bin/perl -Mntheory=:all -E 'say Pi(1e9)' | wc -c 1000000002 91m 7.581s using system gmp libraries 67m 40.951s using gmp libraries from /opt/perl-5.26.1/lib

      Regards, Mario

Re^2: porting C code to Perl
by Discipulus (Canon) on Oct 25, 2017 at 08:26 UTC
    Hello danaj and.. I'm astonished!!

    You provided a lot of high quality informations, each one worth to be investigated.

    I must admit I started this as a game, following my eclectic inclination even if math fu is stalled ~at primary school (you already know my math medieval level), my C understanding is zero and my Perl is.. still improving.

    > If you're happy using it, then great..

    Obviously no! I'm fascinated by this matter and I love Perl so my next step will be to (re-)implement the Chudnovsky algorithm in Perl just for fun and self improvement.

    I'd really like to see a blog post by you about different approach to Pi calculation. I hope it will be accessible (understandable) also for not mathematicians. In this spirit dunno if you noticed the page linked by haukex in this thread. I must admit I'm still trying to decipher it.

    For completness using integer speed up the code quite a bit. See also the interesting ambrus's post about Computing pi to multiple precision


    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

      She He always performs like this. See also at github. Great.

      Update: I adressed the honorable monk wrong. Sorry and thanks to choroba. But it isn't so easy: Dana. See also he

      «The Crux of the Biscuit is the Apostrophe»

      perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help