I appreciate your code; your partitioning approach was very helpful to see. I will see if I can incorporate it into my program (I've moved the logic around such that I embedded the search for pairs into the agorithm that creates the list... this one step made big improvements in the performance but it makes using threads more difficult).
Using your program, my machine starts tailing out at six or seven threads. To get the results below, I added some loops, and ran each number of threads 20 times to get an average. Here are the results:
Threads Used: 1 Time: 2.193750
Threads Used: 2 Time: 1.646408
Threads Used: 3 Time: 1.418214
Threads Used: 4 Time: 1.311989
Threads Used: 5 Time: 1.228125
Threads Used: 6 Time: 1.221352
Threads Used: 7 Time: 1.218237
Threads Used: 8 Time: 1.211988
Threads Used: 9 Time: 1.233855
Threads Used: 10 Time: 1.212500
What I would really like to try is to create a parallel process/thread that receives new numbers to "pair-up" while the main process keeps working to build the numbers. The process would need an event mechanism to add a new number and at the end, return its list. I think POE provides just such a framework. This is much more complicated than partitioning the algorithm, so I'll try to leverage what you've shown me first.