Creating and deleting all those threads will flay the heap. It's possible that you're experiencing heap fragmentation, in which the heap has lots of free blocks but none quite large enough to create a new thread. Moving to a boss--worker model, in which you create a pool of threads at startup and then distribute work to them in round-robin fashion, should help.
Whether threading helps performance will partly depend on your storage hardware. I happen to work for a maker of high-end file servers, handling hundreds of terabytes of storage each. If you send ten concurrent requests to one of those beaties, all those NFS or Cifs latencies will be handled in parallel, and the chances are that you'll be seeking on ten separate disks at once. That means you'll get nearly ten times the performance (and the client will be the limiting factor). OTOH, if the files are stored locally and there's only one direct-attached disk that's struggling to cope, multi-threading will help a little (especially if the OS is clever enough to do elevator seeking on the disk), but don't expect wonders.
Finally, consider moving to fork, rather than Perl threads. A while ago, I wrote a Linux-based Telnet proxy -- Telnet in, Telnet out. (It does logging, connection-sharing and a few other things, but proxying is the essence of it.) At startup, or when you add new ports at runtime, it forks two threads per server port; there are typically between fifteen and forty server ports per proxy. The threads communicate with each other using socket pairs. The proxy runs for months at a time without any apparent memory leaks or performance problems. I can recommend that approach, if it fits the problem you're trying to solve.