The JavaFan code is the fastest. My code is 2nd. There is no 3rd, 4th or 5th place. Things go "downhill" very fast (not just factor of 2x, 3x, but orders of magnitude).
Actually, so does yours. It's just not visible because of the very limited range. Upping the range to
1..90 gives:
Benchmark: running JavaFan, Marshall for at least 2 CPU seconds...
JavaFan: 2 wallclock secs ( 2.18 usr + 0.00 sys = 2.18 CPU) @ 44
+81.19/s (n=9769)
Marshall: 1 wallclock secs ( 2.00 usr + 0.00 sys = 2.00 CPU) @ 34
+2.50/s (n=685)
Also note that your move to put the assignment of $PHI inside the subroutine reduced the running speed by about 10% (it does 5063.11 iterations/s with $PHI only assigned to once).
Incrementing the range even further would increase the difference even more, but for some N between 90 and 100, F(N) no longer fits inside a 64bit integer. And I didn't want to spend the time to write a benchmark using Math::BigFloat/Math::BigInt.