There is overhead, and my understanding is that much of it is the stuff that happens when entering and leaving blocks. Subs count. A chunk of that is spent figuring out the correct lexical scope to use.
You can get a decent idea with a benchmark like the following:
use Benchmark;
sub asub {
for (1 .. 2000) {}
}
Benchmark::cmpthese (5000, {
'sub' => sub { asub() },
'nosub' => sub { for (1 .. 2000) {} },
});
With 5.6.1 on my Linux box, there's a 4% penalty for subroutines. With the latest development version, it's so close to 0% as to be statistically insignificant.
I doubt you'll find many cases where using subroutines effectively outweighs the performance hit of loading a Perl interpreter and compiling your program in a CGI environment each time. Besides that, if you unroll subs enough, you'll probably ruin locality of reference.
Perl's more about developer ease than efficiency/small memory footprint anyway. Sometimes that matters.