Re: How to use my two processors

There are some gains to be had from using threads, but not as much as you might hope for. The following are the results of a single threaded test, and a multi-threaded version operating on the same list using 1, 2, 3 & 4 threads:

C:\test>817827
Took 1.622
Odd numbers in list: 4000
Pairs found: 99135

C:\test>817827-t -T=1
Took 1.790000
Odd numbers in list: 4000
Pairs found: 99135

C:\test>817827-t -T=2
Took 1.365000
Odd numbers in list: 4000
Pairs found: 99135

C:\test>817827-t -T=3
Took 1.047500
Odd numbers in list: 4000
Pairs found: 99135

C:\test>817827-t -T=4
Took 0.832500
Odd numbers in list: 4000
Pairs found: 99135
[download]

As you can see, the overhead of threading costs you relative to the non-threaded version for the 1-thread case. You make a small gain using two. And further small gains using 3 or 4. At the cost of complexity. The tests were run on a 4-core system.

Note: These are only test apps; the results produced--a single level hash with composite keys--may not suit your purposes.

Non-threaded:

#! perl -slw
use strict;
use Time::HiRes qw[ time ];
use Data::Dump qw[ pp ];

our $N ||= 4000;
our $S ||= 1;

srand( $S );
my @list = map 1 | int( rand( 2**32 )), 1 .. $N;

my $start = time;
my %hash;
for my $i ( 0 .. $#list ) {
    use integer;
    my $v = $list[ $i ];
    ( $v & $_ ) == 1 and undef $hash{ "$i:$_" }
        for @list[ $i+1 .. $#list ];
}
printf "Took %.3f\n", time() - $start;

print 'Odd numbers in list: ', scalar @list;
print 'Pairs found: ', scalar keys %hash;

#<>; pp \%hash;
[download]

Threaded

#! perl -slw
use strict;
use threads;
use Time::HiRes qw[ time ];
use Data::Dump qw[ pp ];

$|++;
our $N ||= 4000;
our $T ||= 4;
our $S ||= 1;

srand( $S );
my @list = map 1|int( rand( 2**32 ) ), 1 .. $N;

my $cStart = 0;
my $cSize  = int( @list / $T );

my $start = time;
my @threads = map {
    my( $t ) = threads->create( sub{
        my( $lo, $hi ) = @_;
        use integer;
        my $tid = threads->tid;
        my @keys;
        for my $i ( 0 .. $#list ) {
            my $v = $list[ $i ];
            my $start = $i > $lo ? $i+1 : $lo;
            ( $v & $_ ) == 1 and push @keys, "$i:$_" for @list[ $start
+ .. $hi ];
        }
        return @keys;

    }, $cStart, $_<($T) ? $cStart + $cSize + 1 : $#list );
    $cStart += $cSize;
    $t;
} 1 .. $T;


my %hash;
for( @threads ) {
    my @keys = $_->join;
    undef $hash{ $_ } for @keys;
}
printf "Took %.6f\n", time() - $start;

print 'Odd numbers in list: ', scalar @list;
print 'Pairs found: ', scalar keys %hash;

<>; pp \%hash;
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"I'd rather go naked than blow up my ass"

Comment on Re: How to use my two processors Select or Download Code

Replies are listed 'Best First'.
Re^2: How to use my two processors by Anonymous Monk on Jan 18, 2010 at 22:11 UTC
I appreciate your code; your partitioning approach was very helpful to see. I will see if I can incorporate it into my program (I've moved the logic around such that I embedded the search for pairs into the agorithm that creates the list... this one step made big improvements in the performance but it makes using threads more difficult). Using your program, my machine starts tailing out at six or seven threads. To get the results below, I added some loops, and ran each number of threads 20 times to get an average. Here are the results: `Threads Used: 1 Time: 2.193750 Threads Used: 2 Time: 1.646408 Threads Used: 3 Time: 1.418214 Threads Used: 4 Time: 1.311989 Threads Used: 5 Time: 1.228125 Threads Used: 6 Time: 1.221352 Threads Used: 7 Time: 1.218237 Threads Used: 8 Time: 1.211988 Threads Used: 9 Time: 1.233855 Threads Used: 10 Time: 1.212500` [download] What I would really like to try is to create a parallel process/thread that receives new numbers to "pair-up" while the main process keeps working to build the numbers. The process would need an event mechanism to add a new number and at the end, return its list. I think POE provides just such a framework. This is much more complicated than partitioning the algorithm, so I'll try to leverage what you've shown me first.	[reply] [d/l]
Re^3: How to use my two processors by BrowserUk (Patriarch) on Jan 18, 2010 at 23:46 UTC
If you ran that on a 2-core system, I am pleasantly surprised that you continued to make appreciable gains with more than 2 threads. Very surprised that they were still relatively worthwhile for 3 and 4--are you sure you don't have 2 cpus each with 2 hyper-threads? I've little experience of POE. Historically, I've been wary of the overhead of IPC via *nix pipes, but it was demonstrated to me a year or two ago that things have moved on markedly since my days with HPUX-10, so it may work for you. As for pipelined threads: the problem is that unless the producer thread produces a steady and relatively high stream of output, the consumer thread tends to get a timeslice, wake up, process one item and then have to relinquish the rest of its timeslice because there's no more work waiting for it. That tends to lead to a low ratio of work done to context switch. That's why I favour the partitioning approach. Pipelined processes tend to alleviate that problem by buffering the pipe, thereby deferring the need to switch contexts until there is enough work for the consumer to make the context switch worthwhile. You can emulate that using threads by batching up the producer's output, but it tend to lead to code that needs to be re-tuned manually for each different hardware setup--you initially tune it for 2 cpus and then have to re-tune it if you move it to a 4-core box etc. Even a different workload can screw up the tuning. You set it up for a development box, but when you move it to an identical production box, you have to re-tune it because the workload mix is different. I've encountered similar problems with both event-driven and co-routine setups. If you buffer too little, you end up wasting cycles context switching to the consumer too frequently. If you buffer too much, the producer spins cycles waiting for the consumer to catch up. Good luck. Let us know how you get on. (I'm intrigued by the purpose of this?) Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "I'd rather go naked than blow up my ass"	[reply]
Re^4: How to use my two processors by remzak (Acolyte) on Jan 19, 2010 at 15:55 UTC
I completed the project (as complete as it will ever be). I ended up using one child thread with a queue to create two parallel processes. It was exactly what I wanted, and it was extremely simple to implement. The main process created the valid numbers; the child process paired them up. The two ran very much in parallel since the pairing algorithm actually ran more efficiently working on a growing list of numbers. If I had more processors, I'm not sure how I'd partition the logic. I think I'd have to re-framed the overall algorithm to take advantage of many processors; then, it may have been more efficient to multi-thread and partition like you suggested. The time dropped to 44% of the original (2.25 times faster). Some of the speed came not from the threading, but just learning how to write more efficient perl statements. I learned that there are expensive operations in perl and some very fast operations... I ended up tweaking many, and changing how I was storing the data to take advantage of these realities. I am sure the program could be made even faster once I understand perl a bit better. Thank you to everyone for your help!	[reply]