http://qs321.pair.com?node_id=206098

gri6507 has asked for the wisdom of the Perl Monks concerning the following question:

Hi fellow monks,

I don't remember why I even started doing this (probably as a passtime at work), but here are several algorithms to calculate the factorial of an integer (as in 6! = 6*5*4*3*2*1).

use Math::BigInt; use Benchmark; use strict; $|++; my $t0; my $t1; my $i = Math::BigInt->new($ARGV[0]); $t0 = new Benchmark; fact($i); $t1 = new Benchmark; print "Method 1: ",timestr(timediff($t1, $t0)),"\n"; $t0 = new Benchmark; fact2($i,1); $t1 = new Benchmark; print "Method 2: ",timestr(timediff($t1, $t0)),"\n"; $t0 = new Benchmark; fact3($i); $t1 = new Benchmark; print "Method 3: ",timestr(timediff($t1, $t0)),"\n"; sub fact{ my $n = Math::BigInt->new(shift); return 1 unless $n->bcmp(0); return $n->bmul(fact($n->bsub(1))); } sub fact2{ #Tail Recursion Ellimination my $n = Math::BigInt->new(shift); my $f = shift; if (!$n->bcmp(0)) {return $f} else {return &fact2($n->bsub(1),$n->bmul($f))} } sub fact3{ #no recursion at all my $n = Math::BigInt->new($_[0]); my $i = $_[0]-1; while($i){ $n = Math::BigInt->new($n->bmul($i--)); } return $n; }

What really surprised me was that the recursive algorithm is faster than the straight up loop. Why is that? Isn't it true that in the recursive function, extra steps are taken to store the intermediate values in a stack? Is this to imply that recursion is faster than a loop?? Thanks for your wisdom.

P.S. Suggestions for more robust algorithms are welcome.

Replies are listed 'Best First'.
Re: Factorial algorithm execution time
by Aristotle (Chancellor) on Oct 18, 2002 at 15:45 UTC
    Since multiplication is commutative, you can limit the size of numbers and thus vastly accelerate this function by multiplying the next biggest with the next smallest factor in turn: rather than calculate 1*(2*(3*(4*5))), you calculate ((1*5)*3)*(2*4):
    1 * 2 * 3 * 4 * 5 =
    |   |       |   |
    |   +-------+   |
    +---------------+
    
    5 * 8 * 3 =
    |       |
    +-------+
    
    15 * 8 =
    
    120
    
    See fact7.
    #!/usr/bin/perl -w use strict; use Benchmark; use Math::BigInt; sub fact1 { my $n = Math::BigInt->new(shift); return 1 unless $n->bcmp(0); return $n->bmul(fact1($n->bsub(1))); } sub fact4 { my $n = Math::BigInt->new(1); foreach my $i (map Math::BigInt->new($_), 1..shift) { $n = Math::BigInt->new($i->bmul($n)); } return $n; } sub fact7 { my @factor = map Math::BigInt->new($_), (1 .. shift); while(@factor > 1) { my @pair = splice @factor, 1 + $#factor / 2; $_ = $_ * pop @pair for @factor[0..$#pair]; } return shift @factor; } my $fact = shift || 100; my ($f1, $f4, $f7) = (fact1($fact), fact4($fact), fact7($fact)); die "something's broke" if grep $_ != $f1, ($f4, $f7); timethese(shift || 10, { f1 => sub { fact1($fact) }, f4 => sub { fact4($fact) }, f7 => sub { fact7($fact) }, }); __END__ $ fact 600 5 Deep recursion on subroutine "main::fact1" at fact line 9. Benchmark: timing 5 iterations of f1, f4, f7... Deep recursion on subroutine "main::fact1" at fact line 9. Deep recursion on subroutine "main::fact1" at fact line 9. Deep recursion on subroutine "main::fact1" at fact line 9. Deep recursion on subroutine "main::fact1" at fact line 9. Deep recursion on subroutine "main::fact1" at fact line 9. f1: 17 wallclock secs (17.09 usr + 0.10 sys = 17.19 CPU) @ 0 +.29/s (n=5) f4: 15 wallclock secs (15.09 usr + 0.06 sys = 15.15 CPU) @ 0 +.33/s (n=5) f7: 4 wallclock secs ( 3.99 usr + 0.04 sys = 4.03 CPU) @ 1 +.24/s (n=5)
    The difference is negligible for small numbers, but quickly grows to significant magnitude.

    Makeshifts last the longest.

      I thought about this one also. But your implementation can be improved still. Let's look at 6!:

      2 * 3 * 4 * 5 * 6 = | | | | | +-------+ | +---------------+ 4 * 12 * 15 = | | +---------+ 12 * 60 = 120
      There are a couple of changes. First, there is no need to multiply by 1. The answer is not going to change :-). Second, you are correct in saying that it's best to multiply the smallest number by the largest first. So, in this case, the left over "4" from the first step should be the first number in the second step. These are implemented in fact8(). If you run it, you'll see that it run's just abit faster.

      sub fact8{ #divide and conquer without recursion my @N = (2 .. shift); my @M; my $tmp; while ($#N){ while(@N){ $tmp = pop(@N); if (($_ = shift(@N))){push(@M,Math::BigInt->new($tmp)->bmul($_)) +} else {unshift(@M,$tmp)} } while(@M){ $tmp = pop(@M); if (($_ = shift(@M))){push(@N,Math::BigInt->new($tmp)->bmul($_)) +} else {unshift(@N,$tmp)} } } return @N; } perl fact.pl 5000 Method 8 (new func): 28 wallclock secs (28.65 usr + 0.00 sys = 28.65 +CPU) Method 7 (your func): 30 wallclock secs (29.06 usr + 0.00 sys = 29.06 + CPU)
        You're right. But why so much work? A trivial modification to my code will do that.
        sub fact7b { my @factor = map Math::BigInt->new($_), (2 .. shift); while(@factor > 1) { my @pair = splice @factor, 0, @factor / 2; $_ = $_ * pop @pair for @factor[0..$#pair]; } return shift @factor; }
        Update: doh, fixed overly clever, broken code. (s{1 + $#factor / 2}{@factor / 2})

        Makeshifts last the longest.

Re: Factorial algorithm execution time
by blakem (Monsignor) on Oct 17, 2002 at 19:57 UTC
    Perhaps some trivial unit tests would be helpful.....
    die "fact is broken" unless fact(5) == 120; die "fact2 is broken" unless fact2(5,1) == 120; die "fact3 is broken" unless fact3(5) == 120;
    Update: added second arg to fact2. (thanks YuckFoo)

    -Blake

Re: Factorial algorithm execution time
by Anonymous Monk on Oct 18, 2002 at 04:27 UTC
    The recursive algorithm multiplies smallest to largest, so most of the multiplications take place with the smallest number. The iterative algorithm multiplies largest to smallest so you are multiplying against a very, very big number a lot. Try this iterative implementation that mimics the order of operations in the recursive solution and see if it isn't faster:
    sub fact4{ #no recursion at all my $n = Math::BigInt->new(1); foreach my $i (map Math::BigInt->new($_), 1..shift) { $n = Math::BigInt->new($i->bmul($n)); } return $n; }
    Also the order of the terms seems to matter. This version is not quite as fast:
    sub fact5{ #no recursion at all my $n = Math::BigInt->new(1); foreach my $i (1..shift) { $n = Math::BigInt->new($n->bmul($i)); } return $n; }
    And since most of the time is spent dealing with multiplying big numbers, perhaps you want to divide fewer big numbers? The following is much faster than any other approach given because it limits multiplications with big numbers, but you have to pass it the raw number because Math::BigInt objects do not play well with the averaging manipulation it does:
    sub fact6{ #divide and conquer my $m = shift; my $n = @_ ? shift : $m; if ($m < $n) { my $k = int($m/2 + $n/2); return Math::BigInt->new(fact6($m, $k))->bmul(fact6($k+1,$n)); } else { return Math::BigInt->new($m); } }
      I don't think I understand your algorithm here. When trying to calculate, say 100!, what should the second argument be?

      I thought about this method, but I opted to try out a different method. You are correct in saying that much time is spent multiplying numbers. So what we can do is get rid of that multiplication and turn it into addition. This can be easily done by first taking logs of all integers in the factorial calculation, summing those values and finally raising e to that power. For example, 5!=e^(ln(2)*ln(2)*ln(4)*ln(5)). There is only one problem with that. Since Math::BigFloat does not contain a log or exp function, the limited precision of the built in functions creates a rounded answer (try doing 200!)

        Try print fact6(1, 100); It multiplies everything from the first to the last arguments inclusive. But it will not work if you pass it a Math::BigInt object.

        It works by calculating fact6(1,100) as fact6(1,50)*fact6(51,100). Break those up likewise until you have a range of one number, and you know how to calculate that. So there are very few multiplications with very large numbers.

        You need not pass a second parameter to that function - or so it appears. A quick test shows it being broken. I'm not sure I'm following the algorithm either, so I'll leave it at that. (I misunderstood the parameter handling.)

        Also summing the logarithms will be fast, but calculating the logarithms in the first place is going to take much longer than straight multiplication would have.

        Makeshifts last the longest.

        Thanks to zigdon for pointing out my mistakes. The example above should be 5!=e^(ln(2)+ln(3)+ln(4)+ln(5))
Re: Factorial algorithm execution time
by YuckFoo (Abbot) on Oct 17, 2002 at 20:08 UTC
    I think creation of the BigInt objects might be mucking stuff up. Simplify the test. Loop is faster, and it gets faster as the argument gets bigger.

    YuckFoo

    
    Benchmark: timing 16000 iterations of nocurse, recurse...
       nocurse:  0 wallclock secs 
          ( 0.20 usr +  0.00 sys =  0.20 CPU) @ 80000.00/s (n=16000)
       recurse:  1 wallclock secs 
          ( 0.51 usr +  0.00 sys =  0.51 CPU) @ 31372.55/s (n=16000)
    
    #!/usr/bin/perl use Benchmark; $fact = $ARGV[0] || 10; timethese(16000, {'recurse'=>'recurse($fact)', 'nocurse'=>'nocurse($fact)'}); sub recurse { my ($n) = @_; if ($n == 0) { return 1; } return ($n * recurse($n-1)); } sub nocurse { my ($n) = @_; my $f = 1; for my $i (1..$n) { $f *= $i; } return $f; }
      The reason BigInt was there was to allow for calculations of 1000! or larger (which was the original intent). However, I don't see how this alone could explain the timing differences. For example, in my original code, the recursive function
      • Created one new BigInt
      • Compared two BigInts
      • Multiplied two BigInts
      • Subtracted 2 BigInts

      The non-recursive function
      • Created 2 BigInts
      • Multiplied 2 BigInts

      In other words, are you saying that memory allocation for one extra BigInt is that slow??
Re: Factorial algorithm execution time
by greenback (Initiate) on Oct 25, 2002 at 17:21 UTC
    A more robust algorithm: using the Lanczos approximation, calculate the natural log of Gamma(n+1) and exponentiate the result. You can simply cache n! for n < 30 or so.
      Define robust.

      Also isn't exponentiating the log of n! and carrying accuracy to the last integer is as much or more work than calculating n! in the first place? If you just want the first k digits, the approximation is better. Most of us don't.

      That said, the approximations are good for order of magnitude estimates. Understanding the basic approximations is easier than most think. Watch:

      log(n!) = log(1) + log(2) + ... + log(n) = 1/2 log(1) + (log(1) + log(2))/2 + (log(2) + log(3))/2 + ... + (log(n-1) + log(n))/2 + 1/2 log(n)
      But note that (log(i) + log(i+1))/2 is an approximation of the integral of log(x) from i to i+1. If we define the error of that approximation to be e(i), then we get:
      log(n!) = 1/2 0 + Integral from 1 to n of log(x) + 1/2 log(n) + e(1) + e(2) + ... + e(n-1)
      Which looks like a mess. What have we done other than obfuscation and complication?

      The answer is, "Quite a bit." The goal of approximation techniques is to take something, divide it into a few big terms, understand those terms, a big mess, show that the mess isn't important. This is what we have done.

      That integral is a single huge term. But it is one we understand. The integral of log(x) is x*log(x)-x. So that integral is n*log(n)-n-(1*log(1)-1) = n*log(n)-n+1. log(n)/2 is the next biggest thing around. And finally we have those e()'s. We think that they are small. But how to show that? It turns out that the error in a trapezoidal approximation is w**3 * f''(t)/12 where w is the width of the interval and t is a point in it. (This is reasonable - the error in a linear approximation is tied to what the function's bend does to the integral.) The second derivative of log(x) is -1/(x*x), so e(i) is O(1/(i*i)). If you remember your infinite series, that means that the infinite sum e(1)+e(2)+e(3)+... converges to some number, call it E. And the partial sum from n out to infinity is O(1/n) from that number.

      Remember the principle of approximation? Get a few big understood terms, at the cost adding lots of messy small ones? We are about to do that where "lots" is an infinite number! Taking the above we get:

      log(n!) = 1/2 0 + Integral from 1 to n of log(x) + 1/2 log(n) + e(1) + e(2) + ... + e(n-1) = n*log(n) - n + 1 - 1/2 log(n) + (e(1) + e(2) + e(3) + ...) - (e(n) + e(n+1) + e(n+2) + ...) = n*log(n) - n + log(n)/2 + E + 1 - e(n) - e(n+1) - e(n+2) - ...
      Defining exp(E+1) to be K, we get:
      n! = K * sqrt(n) + n**n * (1 + O(1/n)) / e**n
      And from here the interested reader (if there are any) can go in two directions. The first is study the error terms. The second is to figure out K.

      The first is straightforward but messy. Just look at the infinite sum as an approximation to an integral. Write it as that integral and associated error terms. Get another term of the approximation, with smaller and more complex error terms. Wash rinse and repeat to your heart's desire.

      The second requires cleverness. I suggest sticking this approximation into the binomial expansion, and compare with the central limit theorem's approximation to show that K has to be 1/sqrt(2*pi). If you are clever enough about it you can actually derive the central limit theorem for the binomial expansion up to the constant, then show that the only value of K which makes the terms for (1+1)**n come out to 2**n for large n is 1/sqrt(2*pi).

      These are left as exercises because I am tired of typing. :-)

        Why do you think that truncating Stirling's series is preferable to approximating the more accurate Lanczos asymptotic series (which is valid for Re(n) > 0)? It is more 'robust' to calculate the logarithm of the gamma function because you are less prone to overflow for large values of n.