Even with the fastest possible machine and language, performance-aware algorithm design usually makes more difference. To take an extreme case, I once rewrote a C-program (which is faster than C++) in Perl and improved the performance by a factor of 60. To re-implement my algorithm in C to try to go even faster (theoretically possible of course) would however have meant an enormous investment in rewriting a C-PAN module and a lot of Perls guts in pure C to support the better algorithm that Perl had inspired.
Update: this took place in ING Baring's market data department many years ago - the case was the calculation of correlation co-efficients, from N market prices, generating MxN(N-1)/2 values for M times at which prices where snapshot. The C-program did 2*M*N*N iterations through home-grown correlation coding. The Perl program used a CPAN module iterating it only N(N-1)/2 times with a few optimisations thrown in of the kind we've discussed already.