> then it's an exaggeration to say it's 'several orders of magnitude faster'; based on those numbers, it's one order of magnitude faster.
I used to confuse orders of magnitude with times for a long time, until I realized the
latter former referred to the exponent (usually applied to 10). Nonetheless, the type of hubris in this post begs for master of these kinds of topics. Perhaps OP is willing to contract someone to do this, which I think would be the appropriate tact. That said, the initial step of using Inline::C you suggested I think is the right one. For additional speed up, if the algorithm is parallelizable, is to introduce OpenMP, which is a lot more straightforward with Alien::OpenMP (and OpenMP::Environment for good measure).
Updated thanks to note by jdporter.