Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^5: threading a perl script

by BrowserUk (Patriarch)
on Apr 24, 2011 at 10:20 UTC ( [id://901037]=note: print w/replies, xml ) Need Help??


in reply to Re^4: threading a perl script
in thread threading a perl script

PS do 128-core processor exist?

Oh yes. And even more also if you have the cash. But next year they'll be cheaper. And the year after that cheaper still.

  • See IBM Power 750. 4 SMP processors each with 8-cores; each core with 4-hardware threads, giving 128 concurrent threads of execution.

    Or its big brother, the Power 795 32 SMP processors each with 8-cores, each core with 4 hardware threads apiece for 1024 concurrent threads.

  • Or HP Superdome II with 256 cores.
  • Or Sparc M9000 with 256 cores.
Actually I do not agree with your point, that I will have much benefit from threaded perl on multicore for complicated tasks. Increasing CPU power is not the way to go.

Sorry, but you are wrong. Due to the physical limits of the silicone, the chip-fabs cannot increase the clock speed any higher than they currently are without risking thermal runaway, so increasing the number of cores is the only way to go. And as you can see from the above, the hardware guys are already going that way.

Most probably I will use number-crunching libraries from perl (lapack or maybe specialized library).
Perl has owervelming strength in text processing, GUI, etc, its compact and nice, plus we have CPAN - that's brilliant. But using it for calculation-intensive task is just wrong.

Math libraries don't help where no math is involved. DNA work is all text processing. By you own words, one of perl's strengths.

The speed of comparing two strings is entirely limited by the speed of the processor. And clocks speeds are not increasing. The only way to speed up text processing is to compare more than two strings at once.

addition: we have a fresh fork-related bug for windows perl today

Who cares. Don't use fork on windows.

Perl has owervelming strength in text processing, GUI, etc, its compact and nice, plus we have CPAN - that's brilliant. But using it for calculation-intensive task is just wrong. and then - when program speed is not enough - increasing number of CPUs is even worse :)

The great drawback of perl threads - they are heavy - and also unstable

The (so called) heaviness is irrelevant, 47MB (see below) is a mere drop in the ocean of my 4GB of ram. And for £40 I could double that. Heck. My browser is currently using 973MB as I type this.

And you're wrong about the stability too. They've been very stable on Windows for several years now. There is (or was until recently), still a memory leak with spawning threads on *nix, but if you do the sensible thing and only spawn 1 or 2 threads per core, that is entirely manageable. Mind you, if more people actually used threads on *nix, instead of burying their heads in the sand as a defence against learning the inevitable "new thing", that would probably have been fixed long ago.

This 60 line, standalone, single-threaded DNA fuzzy matching script runs in just under 10MB, uses 25% of my (£400) 4-core box and takes 6 minutes 54.5 seconds to fuzzy match 25,000 25-base motifs against each 100k base sequence. For that 1GB/10,000 sequence file I mentioned, that means a total elapsed time of 48 days or just under 7 weeks.:

This, 75-line, standalone, multi-threaded DNA fuzzy matching script runs in 47MB, uses 100% of my (£400) 4-core box and also takes 7 minutes of CPU to fuzzy match 25,000 25-base motifs against each 100k base sequences. But, it processes 4 sequences at a time, so the total elapsed time for that 1GB/10,000 sequence file falls to just over 12 days:

On that IBM Power 750, with the addition of a simple command line switch:  -NTHREADS=128 and you can expect that to drop to just 9 hours! On the 795 with -NTHREADS=1024 1 hour and 10 minutes.

You aren't going to see those sort of gains from using a math library; nor from re-coding the program in C; nor from finding a "better algorithm".

And if you are doing any serious XML processing, those sort of gains are available to you also. Via threading.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^6: threading a perl script
by vkon (Curate) on Apr 24, 2011 at 17:42 UTC
    I agree that by using threads more often perl threads will get better.
    Maybe I will use threads more often - but right now there is no pushing need for me on that, and also I still have an impression that I am paying too much a price for threads - (10% slowdown for running ordinary scripts and increased complexity of scripts)

    I know nothing about DNA processing and so could be unaware of some hidden quirks on that. You know better.

    but yor example do not convince me much.
    Obviously - your 'fuzzy' sub will be much faster in C (or in XS), but even better than that - there are fuzzy string matchers C libraries, that should work better.

    you're reinventing something in pure-perl, while there are number of possibilities:

    • TRE, POSIX-compliant regexp engine that allows fuzzy matches, and has wrapper around it
    • http://search.cpan.org/~jgoldberg/Text-LevenshteinXS-0.03/LevenshteinXS.pm
    • http://search.cpan.org/~jhi/String-Approx-3.26/Approx.pm
    • etc
    all these are fast and (hopefully) well-tested.

    But - left aside your current implementation of some fuzzy matches - I have a conclusion, that - maybe you could create an example where perl program could benefit from being threaded.

    For me, this is not so - in my real life - non-threaded perl is better - because it is faster - and I, personally, never benefit from threads in perl.

    Having said that,
    I will remain your friend with non-threaded perl
    :) :)

      ... all these are fast and (hopefully) well-tested.

      No. They are not. They measure the wrong thing and are dog slow. Trust me on this, because I have tried them and measured.

      And that is the problem. You--like many others--make definitive statements based upon assumptions rather than tests.

      I, personally, never benefit from threads in perl.

      Of course you don't. You never use them. How could you?

      I will remain your friend with non-threaded perl

      Great! I really hope that is so. Please don't take what follows personally.

      What I do object to is you--and others--who are variously too scared, too lazy, or threads-are-too-MS, or simply too disinterested, to be bothered to actually try them, and work out what they are good for and what they are not, popping up in any discussion the mentions threads, saying they are useless, broken and will cause your grandchildren to be born with multiple heads.

      And all based on something you half-read and totally misunderstood, written by somebody who misheard it from someone they met at a tech conference--or maybe they read it on the back of a cereal packet.

      If you don't use them, that's fine. If you can demonstrate alternative solutions that are simpler, quicker, more portable, or more X, please do. Really, please do. I like nothing better than a good head-to-head comparison. But if all you have to 'contribute' to these threads is general, non-specific negativity, it really wears thin fast.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        all true.
        except for this
                 And all based on something you half-read and totally misunderstood, written by somebody who misheard it from someone they met a tech conference--or meybe read it on the back of a cereal packet.
        my opinion was based on RTFS and reading p5p and reading ./win32/Makefile

                  They measure the wrong thing and are dog slow.
        again, you know better, given that you attacked a problem thoughtfully.

        ok, you convinced me :)
        I will have more careful approach about threads, and will stop explaining to each other about drawbacks of perl threads.

        Have a nice day,
        Vadim.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://901037]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (6)
As of 2024-04-19 09:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found