With Perl threads, the thing that is usually the slowest is creating a thread. This is because creating a Perl thread requires cloning every data structure you have created. So I would first determine if that is where the time is being spent. You can report that by comparing the time at the start of the routine running against the time just before you ask for a thread to be created.
If that is indeed where most of the time is spent, then you should do what just about anybody who has gotten good at using Perl threads does: Create your threads very early and just ship work to them, usually via a thread queue such as Thread::Queue.
Though, looking at your code, I don't see where you would have a large data structure that should not have been destroyed before you create the thread. But, that data could be something cached (perhaps by accident) in some module you have pulled in, for example.
Update: Or it could be time spent destroying the second interpreter instance.