Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: How to use my two processors

by BrowserUk (Patriarch)
on Jan 18, 2010 at 13:00 UTC ( [id://817970]=note: print w/replies, xml ) Need Help??


in reply to How to use my two processors
in thread Need a faster way to find matches

There are some gains to be had from using threads, but not as much as you might hope for. The following are the results of a single threaded test, and a multi-threaded version operating on the same list using 1, 2, 3 & 4 threads:

C:\test>817827 Took 1.622 Odd numbers in list: 4000 Pairs found: 99135 C:\test>817827-t -T=1 Took 1.790000 Odd numbers in list: 4000 Pairs found: 99135 C:\test>817827-t -T=2 Took 1.365000 Odd numbers in list: 4000 Pairs found: 99135 C:\test>817827-t -T=3 Took 1.047500 Odd numbers in list: 4000 Pairs found: 99135 C:\test>817827-t -T=4 Took 0.832500 Odd numbers in list: 4000 Pairs found: 99135

As you can see, the overhead of threading costs you relative to the non-threaded version for the 1-thread case. You make a small gain using two. And further small gains using 3 or 4. At the cost of complexity. The tests were run on a 4-core system.

Note: These are only test apps; the results produced--a single level hash with composite keys--may not suit your purposes.

Non-threaded:

#! perl -slw use strict; use Time::HiRes qw[ time ]; use Data::Dump qw[ pp ]; our $N ||= 4000; our $S ||= 1; srand( $S ); my @list = map 1 | int( rand( 2**32 )), 1 .. $N; my $start = time; my %hash; for my $i ( 0 .. $#list ) { use integer; my $v = $list[ $i ]; ( $v & $_ ) == 1 and undef $hash{ "$i:$_" } for @list[ $i+1 .. $#list ]; } printf "Took %.3f\n", time() - $start; print 'Odd numbers in list: ', scalar @list; print 'Pairs found: ', scalar keys %hash; #<>; pp \%hash;

Threaded

#! perl -slw use strict; use threads; use Time::HiRes qw[ time ]; use Data::Dump qw[ pp ]; $|++; our $N ||= 4000; our $T ||= 4; our $S ||= 1; srand( $S ); my @list = map 1|int( rand( 2**32 ) ), 1 .. $N; my $cStart = 0; my $cSize = int( @list / $T ); my $start = time; my @threads = map { my( $t ) = threads->create( sub{ my( $lo, $hi ) = @_; use integer; my $tid = threads->tid; my @keys; for my $i ( 0 .. $#list ) { my $v = $list[ $i ]; my $start = $i > $lo ? $i+1 : $lo; ( $v & $_ ) == 1 and push @keys, "$i:$_" for @list[ $start + .. $hi ]; } return @keys; }, $cStart, $_<($T) ? $cStart + $cSize + 1 : $#list ); $cStart += $cSize; $t; } 1 .. $T; my %hash; for( @threads ) { my @keys = $_->join; undef $hash{ $_ } for @keys; } printf "Took %.6f\n", time() - $start; print 'Odd numbers in list: ', scalar @list; print 'Pairs found: ', scalar keys %hash; <>; pp \%hash;

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: How to use my two processors
by Anonymous Monk on Jan 18, 2010 at 22:11 UTC
    I appreciate your code; your partitioning approach was very helpful to see. I will see if I can incorporate it into my program (I've moved the logic around such that I embedded the search for pairs into the agorithm that creates the list... this one step made big improvements in the performance but it makes using threads more difficult).

    Using your program, my machine starts tailing out at six or seven threads. To get the results below, I added some loops, and ran each number of threads 20 times to get an average. Here are the results:

    Threads Used: 1 Time: 2.193750 Threads Used: 2 Time: 1.646408 Threads Used: 3 Time: 1.418214 Threads Used: 4 Time: 1.311989 Threads Used: 5 Time: 1.228125 Threads Used: 6 Time: 1.221352 Threads Used: 7 Time: 1.218237 Threads Used: 8 Time: 1.211988 Threads Used: 9 Time: 1.233855 Threads Used: 10 Time: 1.212500

    What I would really like to try is to create a parallel process/thread that receives new numbers to "pair-up" while the main process keeps working to build the numbers. The process would need an event mechanism to add a new number and at the end, return its list. I think POE provides just such a framework. This is much more complicated than partitioning the algorithm, so I'll try to leverage what you've shown me first.

      If you ran that on a 2-core system, I am pleasantly surprised that you continued to make appreciable gains with more than 2 threads. Very surprised that they were still relatively worthwhile for 3 and 4--are you sure you don't have 2 cpus each with 2 hyper-threads?

      I've little experience of POE. Historically, I've been wary of the overhead of IPC via *nix pipes, but it was demonstrated to me a year or two ago that things have moved on markedly since my days with HPUX-10, so it may work for you.

      As for pipelined threads: the problem is that unless the producer thread produces a steady and relatively high stream of output, the consumer thread tends to get a timeslice, wake up, process one item and then have to relinquish the rest of its timeslice because there's no more work waiting for it. That tends to lead to a low ratio of work done to context switch. That's why I favour the partitioning approach.

      Pipelined processes tend to alleviate that problem by buffering the pipe, thereby deferring the need to switch contexts until there is enough work for the consumer to make the context switch worthwhile. You can emulate that using threads by batching up the producer's output, but it tend to lead to code that needs to be re-tuned manually for each different hardware setup--you initially tune it for 2 cpus and then have to re-tune it if you move it to a 4-core box etc. Even a different workload can screw up the tuning. You set it up for a development box, but when you move it to an identical production box, you have to re-tune it because the workload mix is different.

      I've encountered similar problems with both event-driven and co-routine setups. If you buffer too little, you end up wasting cycles context switching to the consumer too frequently. If you buffer too much, the producer spins cycles waiting for the consumer to catch up.

      Good luck. Let us know how you get on. (I'm intrigued by the purpose of this?)


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        I completed the project (as complete as it will ever be). I ended up using one child thread with a queue to create two parallel processes. It was exactly what I wanted, and it was extremely simple to implement. The main process created the valid numbers; the child process paired them up. The two ran very much in parallel since the pairing algorithm actually ran more efficiently working on a growing list of numbers. If I had more processors, I'm not sure how I'd partition the logic.

        I think I'd have to re-framed the overall algorithm to take advantage of many processors; then, it may have been more efficient to multi-thread and partition like you suggested.

        The time dropped to 44% of the original (2.25 times faster). Some of the speed came not from the threading, but just learning how to write more efficient perl statements. I learned that there are expensive operations in perl and some very fast operations... I ended up tweaking many, and changing how I was storing the data to take advantage of these realities.

        I am sure the program could be made even faster once I understand perl a bit better.

        Thank you to everyone for your help!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://817970]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (7)
As of 2024-04-19 09:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found