http://qs321.pair.com?node_id=1129972

Update June 26, 2015: New results using Math::Prime::Util v0.51 at the end of the post.

Some time ago traveled to an imaginary place filled with prime numbers. In one of the paintings was the following statement. With k initialized to 1, the value alternates from 2 to 1 repeatedly. It was this statement which inspired me to give Algorithm3 a try.

k = 3 - k;

There are various prime modules on CPAN. Thus, have no plans to create another. All I wanted to do was to compare the Sieve of Eratosthenes with Algorithm3. I created a local copy of Math::Prime::FastSieve by davido and replaced the primes function.

Sieve of Eratosthenes

/* Sieve of Eratosthenes. Return a reference to an array containing a +ll * prime numbers less than or equal to search_to. Uses an optimized s +ieve * that requires one bit per odd from 0 .. n. Evens aren't represente +d in the * sieve. 2 is just handled as a special case. */ SV* primes( long search_to ) { AV* av = newAV(); if( search_to < 2 ) return newRV_noinc( (SV*) av ); // Return an empty list ref. av_push( av, newSVuv( 2UL ) ); // Allocate space for odd numbers (15 bits per 30 values) sieve_type primes( search_to/2 + 1, 0 ); // Sieve over the odd numbers for( sieve_size_t i = 3; i * i <= search_to; i+=2 ) if( ! primes[i/2] ) for( sieve_size_t k = i*i; k <= search_to; k += 2*i) primes[k/2] = 1; // Add each prime to the list ref for( sieve_size_t i = 3; i <= search_to; i += 2 ) if( ! primes[i/2] ) av_push( av, newSVuv( static_cast<unsigned long>( i ) ) ); return newRV_noinc( (SV*) av ); }

Sieve of Xuedong Luo (Algorithm3)

/* Sieve of Xuedong Luo (Algorithm3). Return a reference to an array * containing all prime numbers less than or equal to search_to. * * A practical sieve algorithm for finding prime numbers. * ACM Volume 32 Issue 3, March 1989, Pages 344-346 * http://dl.acm.org/citation.cfm?doid=62065.62072 * * Avoid all composites that have 2 or 3 as one of their prime factors * where i is odd. * * { 0, 5, 7, 11, 13, ... 3i + 2, 3(i + 1) + 1, ..., N } * 0, 1, 2, 3, 4, ... list indices (0 is not used) */ SV* primes( long search_to ) { AV* av = newAV(); if( search_to < 2 ) return newRV_noinc( (SV*) av ); // Return an empty list ref. sieve_size_t i, j, q = (sieve_size_t) sqrt((double) search_to) / 3 +; sieve_size_t M = (sieve_size_t) search_to / 3; sieve_size_t c = 0, k = 1, t = 2, ij; // Allocate space. Set bits to 1. Unset bit 0. sieve_type primes( M + 2, 1 ); primes[0] = 0; // Unset bits greater than search_to. if ( 3 * M + 2 > search_to + ((sieve_size_t)search_to & 1) ) primes[M] = 0; if ( 3 * (M + 1) + 1 > search_to + ((sieve_size_t)search_to & 1) ) primes[M + 1] = 0; // Clear composites. for ( i = 1; i <= q; i++ ) { k = 3 - k, c = 4 * k * i + c, j = c; ij = 2 * i * (3 - k) + 1, t = 4 * k + t; if ( primes[i] ) { while ( j <= M ) { primes[j] = 0; j += ij, ij = t - ij; } } } // Gather primes. if( search_to >= 2 ) av_push( av, newSVuv( 2UL ) ); if( search_to >= 3 ) av_push( av, newSVuv( 3UL ) ); for ( i = 1; i <= M; i += 2 ) { if ( primes[i] ) av_push( av, newSVuv(static_cast<unsigned long>(3 * i + 2)) +); if ( primes[i + 1] ) av_push( av, newSVuv(static_cast<unsigned long>(3 * (i + 1) ++ 1)) ); } return newRV_noinc( (SV*) av ); }

Below is the time taken to find all prime numbers smaller than 1 billion.

my $primes = primes( 1_000_000_000 ); Sieve of Eratosthenes 4.879 seconds Sieve of Xuedong Luo 3.751 seconds There are 50,847,534 prime numbers between 1 and 1 billion.

I was so fascinated by Algorithm3 that I decided to parallelize it. It took me 5 weekends just to get the math to work. But I wanted faster and placed it aside. Two years later tried again and created Sandboxing with Perl + MCE + Inline::C. Also, tried Math::Prime::Util by Dana Jacobsen and primesieve by Kim Walisch.

Testing was done on a Haswell Core i7 Macbook Pro running at 2.6 GHz configured with 1600 MHz memory.

# # Count primes # perl algorithm3.pl 1_000_000_000 Prime numbers : 50847534 Compute time : 0.146 sec perl primesieve.pl 1_000_000_000 Prime numbers : 50847534 Compute time : 0.064 sec perl primeutil.pl 1_000_000_000 Prime numbers : 50847534 Compute time : 0.024 sec # # Sum primes ( 203_280_221 prime numbers ) # perl algorithm3.pl 4_294_967_296 --sum Sum of primes : 425649736193687430 Compute time : 1.082 sec perl primesieve.pl 4_294_967_296 --sum Sum of primes : 425649736193687430 Compute time : 0.369 sec perl primeutil.pl 4_294_967_296 --sum Sum of primes : 425649736193687430 Compute time : 2.207 sec # # Print primes ( 2.0 GB, beware... ) # perl algorithm3.pl 4_294_967_296 --print >/dev/null Compute time : 2.071 sec perl primesieve.pl 4_294_967_296 --print >/dev/null Compute time : 1.395 sec perl primeutil.pl 4_294_967_296 --print >/dev/null Compute time : 13.470 sec

Fast is possible in Perl. Thus, Perl is fun. One is not likely to print that many prime numbers. Math::Prime::Util is powerful with many features. Algorithm3 was mainly an exercise exploring Perl + MCE + Inline::C possibilities.

Update June 26, 2015 using Math::Prime::Util v0.51

The mce-sandbox was updated to call the new sum_primes/print_primes functions in Math::Prime::Util v0.51 for the primeutil.pl example.

Count primes $ perl algorithm3.pl 4294967296 Prime numbers : 203280221 Compute time : 0.623 sec $ perl primesieve.pl 4294967296 Prime numbers : 203280221 Compute time : 0.252 sec $ perl primeutil.pl 4294967296 Prime numbers : 203280221 Compute time : 0.210 sec Sum of primes $ perl algorithm3.pl 4294967296 --sum Sum of primes : 425649736193687430 Compute time : 1.090 sec $ perl primesieve.pl 4294967296 --sum Sum of primes : 425649736193687430 Compute time : 0.367 sec $ perl primeutil.pl 4294967296 --sum Sum of primes : 425649736193687430 Compute time : 0.768 sec Print primes ( outputs 2 GB containing 2032802221 prime numbers ) $ perl algorithm3.pl 4294967296 --print >/dev/null Compute time : 2.086 sec $ perl primesieve.pl 4294967296 --print >/dev/null Compute time : 1.397 sec $ perl primeutil.pl 4294967296 --print >/dev/null Compute time : 1.925 sec

Kind regards, Mario

Replies are listed 'Best First'.
Re: The sieve of Xuedong Luo (Algorithm3) for generating prime numbers
by danaj (Friar) on Jun 13, 2015 at 05:10 UTC

    Thanks for writing this up. We talked about this a year ago! Brings back memories. I haven't made any sieve changes since your testing pointed out some inefficiencies and I did a release (June 13 2014). I just reran my benchmarks since I had a slightly older version of your sieve (before you added prefill and a few other things).

    Here is a table of counting times for some segmented sieves. Single thread on i4770k.

    Segmented sieves 10^9 10^10 10^11 10^12 10^13
    yafu 0.285 1.876 20.241 4m 27.8s 54m 51s
    primesieve 5.4.2 0.119 1.661 23.480 4m 56.0s 58m 18s
    primegen 0.97 0.258 2.750 33.843 9m 43.1s 207m 56s
    MPU 0.41 0.320 3.529 42.971 9m 28.8s 119m 3s
    TOeS v2 0.422 4.739 55.283 10m 41.7s 121m 3s
    Roy algo3 v2 0.435 5.087 64.456 12m 28.0s 150m 10s
    Flammenkamp 0.528 5.937 79.211 16m 24.1s
    tverniquet 0.643 6.937 80.876 17m 27.7s
    Roy algo3 v1 0.611 6.949 77.846
    Walisch bit segmnt 0.909 8.226 91.691 19m 9.9s

    Comments: These all easily beat monolithic sieves such as Pari, Mathisen 1998, trivial SoE, MPU's plain sieve, etc. because I'm starting out at 10^9. All things are not equal: the code for counting is non-negligible and differs for most implementations. primegen is extremely sensitive to the cache parameter, which will differ on each platform. I tuned to the fastest value for this machine. The TOeS code also has some parameters I tuned but they're not as sensitive. Primesieve is set up for multiple threads -- this only uses one for direct comparison. Performance for primes from 2 to n may be very different than performance for primes from A to B (e.g. 10^18 to 10^18+1e6). This is purely for timing the sieves -- there are far faster ways to count primes (under a half-second for 10^13).

    I don't benchmark printing primes because it becomes a contest of who can do the fastest itoa + write. That's nice, but the prime generation time gets lost. primesieve is faster generating primes, but with a good output routine I can beat it in printing by over 2x.

    Random thoughts on this... yafu is quite fast, but the code isn't very usable outside of yafu. primesieve is extraordinarily fast, and would be faster if I let it use 8 threads. primegen is Bernstein's Sieve of Atkin which is pretty fast, but has poor growth compared to SoE (!) so my code catches it sometime before 10^12. Oliveira e Silva's example sieve is ok at smaller sizes but the growth rate is very nice so it's doing well at the high range (which was exactly what it was designed for). Algorithm 3 is doing very well, beating Flammenkamp's 2.0c code which used to be touted as one of the fastest sieves. Walisch's bit segment code is just an example of showing the idea and isn't optimized.

      Thank you for this beautiful writeup danaj. I had spent many weeks converting Algorithm3 to a segmented sieve and was all I could do at the time. The best part of the experience was when I searched CPAN and came across Math::Prime::Util. Because, that is closer in reaching the powerful primesieve generator.

Re: The sieve of Xuedong Luo (Algorithm3) for generating prime numbers
by davido (Cardinal) on Jun 11, 2015 at 05:22 UTC

    Thanks for this write-up. It's interesting. I hope that danaj has a chance to see it too.


    Dave

Re: The sieve of Xuedong Luo (Algorithm3) for generating prime numbers
by danaj (Friar) on Jun 13, 2015 at 07:02 UTC

    Again, thanks for writing this up. Perl is indeed fun. MCE + Inline::C is pretty cool.

    As I mentioned last year, counting primes can be much faster than sieving them. Using MCE with MPU's prime_count is counter-productive as it is doing much more work than a single thread calling it once. The analagous thing for primesieve is Kim Walisch's primecount -- also vastly faster than primesieve for counting. It's very slightly slower than MPU on a single thread, but its target is multiple threads where it cleans up. Times are on my MBP 2.2GHz (should be very similar to yours, just clocked lower). The only change is using the github version of Math::Prime::Util and changing bin/primeutil.pl to use the two new functions.

    perl bin/algorithm3.pl 1_000_000_000 Prime numbers : 50847534 Compute time : 0.163 sec perl bin/primesieve.pl 1_000_000_000 Prime numbers : 50847534 Compute time : 0.071 sec perl bin/primeutil.pl 1_000_000_000 Prime numbers : 50847534 Compute time : 0.025 sec perl -Mntheory=:all -E 'say prime_count(1e9)' 50847534 0.020s user time
    This really shows up when looking at larger sizes. 1e11 takes only 0.038s with prime_count, but the MCE version takes much longer. The single last thread is doing almost twice the work of a single call for the entire answer. Neither one is doing any sieving.

    I added a simple sum_primes and print_primes yesterday. Not on CPAN yet. It helps quite a bit:

    perl bin/algorithm3.pl 4_294_967_296 --sum Sum of primes : 425649736193687430 Compute time : 1.222 sec perl bin/primesieve.pl 4_294_967_296 --sum Sum of primes : 425649736193687430 Compute time : 0.431 sec perl bin/primeutil.pl 4_294_967_296 --sum Sum of primes : 425649736193687430 Compute time : 0.873 sec perl -Mntheory=:all -E 'say sum_primes(2**32)' 425649736193687430 4.627s user time perl -Mntheory=:all -E 'my $s=0; forprimes { $s+=$_ } 2**32; say $s' 425649736193687430 11.198s user time

    The downside is that it a very specialized function, while forprimes { ... } $low,$high is very generic. There is also vecsum over primes, but that has even more overhead.

    perl bin/algorithm3.pl 4_294_967_296 --print >/dev/null Compute time : 3.279 sec perl bin/primesieve.pl 4_294_967_296 --print >/dev/null Compute time : 3.126 sec perl bin/primeutil.pl 4_294_967_296 --print >/dev/null Compute time : 3.279 sec perl -Mntheory=:all -E 'print_primes(2**32)' >/dev/null 8.149 user time perl -Mntheory=:all -E 'forprimes { say } 2**32' 80.9s user time primesieve -p1 2**32 >/dev/null 172.9s user time

    Printing is much faster. Your code was better than the simple forprimes here of course, but now it will quickly print primes in a range to a file descriptor. It looks like our print timing is different -- maybe a difference in the temp file speed. primesieve's print time is excessive on this machine. I rebuilt it with a printf unsigned long and it went down to 28 seconds. It didn't change the MCE time since you're doing your own efficient printing.

    One of these days I'll invert my code: rip out all the C code and put it in a standalone library for use by anything, then have ntheory (MPU) link to it. Then the MCE + Inline::C could get the high performance without mucking with XS.

      Wow danaj. ++ for the sum_primes and print_primes functions. The sandbox was a lot of fun combining Perl + MCE + Inline::C. Generating primes made it more interesting.

Re: The sieve of Xuedong Luo (Algorithm3) for generating prime numbers
by Anonymous Monk on Jun 11, 2015 at 17:45 UTC

    Algorithm3 packs the sieve more tightly than regular odds-only (ie. in wheel6 formation, 2 bits per 6 numbers). Other than that, it's pretty much straight-up Eratosthenes. The clever flip-flops (k = 3 - k and ij = t - ij) disappear when the loops are unrolled (by hand or by compiler).

    All in all, I don't expect this algorithm to hold much interest.

      Using a mod-6 wheel is quite common. ntheory (aka Math::Prime::Util) uses a mod-30 wheel which is nice because their are 8 candidates per block, which packs nicely in a byte. The Sieve of Atkin is a twist on mod-30 (among other things, it's mod-60 which lets it skip 2,3,5). Terje Mathisen wrote a neat little monolithic mod-30 sieve in 1998 you can find on the net. I've seen a mod-2310 sieve but it seems very unwieldy for very little gain. PrimeSieve has 3 sieves for use at different sizes. A mod-30 sieve for small sizes, a mod-210 wheel for medium, and an optimized version of Oliveira e Silva's sieve for large sizes.

      Quesada and Van Pelt wrote about a mod-30 sieve in 1996 (they cite Luo's 1989 paper), Sorenson in 1991. He even has a section discussing their agreements and disagreements with Luo's paper as well a Pritchard's earlier paper. Sorenson has a paper on Arxiv from March 2015 that I've been meaning to look at, though I don't think it will change anything (e.g. "The proof is a bit artificial and of theoretical interest only. Coding this in practice seems daunting."). People are still working on these things.

      There is no question that primesieve is fast. It's great stuff, and Kim is a sharp developer who has been working on this for many years. He also has a slightly different target -- at large sizes the TOeS algorithm uses a lot of memory. I have tried to keep memory use more constrained. This comes up a lot with tables and precalculation, where I really can't do too much, as most users will probably never even call whatever function I'm optimizing, and I get a lot of feedback about trying to keep the initial size down. primesieve, and its sister project primecount, strive to be fast and offer C and C++ APIs.

      Hmm, I had never heard before about Xuedong Luo, nor about "Algorithm3".

      But when I was in my first year in technical high school (first university year) in computer science, back in 1989-1990, I was given a homework assignment to implement a prime number algorithm in Pascal (I remember it was in Pascal, but it might have been in Basic or in C, the two other languages we had to work with at the time). My first implementation was just a standard Eratosthenes sieve just using odd numbers both for numbers to be examined and for candidate divisors (with a special case for 2, the only even number that is prime).

      In that assignment, I then noticed that the algorithm was getting really slow when getting around the 1,000,000 mark (if I remember correctly the timing, hardware of the time was really not like today, especially with my budget PC then). So I offered at least two possible performance optimizations:

      - discarding even numbers and multiples of 3;

      - choosing possible divisors only among the primes found so far;

      If I remember correctly, I obtained the maximum possible mark for that homework assignment.

      But if the real advantage of Algorithm3 is just discarding not only odd numbers but also multiples of 3, then I could almost claim to be a co-discoverer (I am of course just joking, I've never published on that, and I am fairly sure some other people thought about that possibility before). But, still, same idea, same period. And I am almost sure that I still have that assignment paper copy in my archives, with the annotations from my professor at the time (at least I still had it very recently, I don't think I have thrown it away in between).

      I might try to post the Pascal abridged source code tomorrow, if I remember.

        That is awesome. I like Algorithm3 a lot. It is exciting to know that someone else attempted to make the sieve of Eratosthenes faster. Beautiful minds.

      I applaud Xuedong Luo, Laurent_R, and others for taking the sieve of Eratosthenes and creating a faster and more compact algorithm.

        Ooh, I clicked on the wrong reply link. I met to reply above to Laurent_R.

Re: The sieve of Xuedong Luo (Algorithm3) for generating prime numbers
by marioroy (Prior) on Jun 26, 2015 at 15:20 UTC

    Dana added sum_primes and print_primes in the Math::Prime::Util v.0.51 release.

    The OP post was updated with results taken from v0.51 for the primeutil.pl example.

    Thank you Dana for Math::Prime::Util.

    Kind regards, Mario