Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

converting from process to threads

by FromTheMotherland (Novice)
on Jan 19, 2008 at 16:03 UTC ( [id://663223]=perlquestion: print w/replies, xml ) Need Help??

FromTheMotherland has asked for the wisdom of the Perl Monks concerning the following question:

hi, i'm using Parallel::ForkManager to automate the download and parsing of some urls. The process is:
my $pmx=new Parallel::ForkManager(10); while(my $url=<URLFILE>) { $pmx->start and next; &download_url_via_lwp_or_mechanize; &parse_url; &write_results_to_file; $pmx->finish; }
Is there a way I can use threads instead? I look at my memory usage and every perl process is taking about 2% of cpu, 73mb of virtual memory, and 6-13 megs of real memory. I'm looking for some ways to optimize this pipeline. Any help will be appreciated!

Replies are listed 'Best First'.
Re: converting from process to threads
by plobsing (Friar) on Jan 19, 2008 at 19:45 UTC
    In cases of forked processes, you may see results for memory usage which are inflated. This is due to the fact that forking causes memory to be marked copy on write on many systems. Each process reports all the memory that it is using, even if it is currently shared.

    I doubt that changing to threading will reduce your memory usage. If anything, based on the implementation details of ithreads, chances are it will increase memory usage.
      i thought threads are supposed to take up less memory than processes...hmm. So whats the advantage of threads in perl?
        The only real advantage of using threads, is you can easily share and return values between threads, with "use shared". Otherwise forking is probably faster and more memory efficient.

        I'm not really a human, but I play one on earth. Cogito ergo sum a bum
Re: converting from process to threads
by pc88mxer (Vicar) on Jan 19, 2008 at 19:52 UTC
    Just a few questions:
    • How much memory is being shared between your child processes? I.e. what does 'top' report in the 'SHARED' column?
    • If they are not sharing a lot of memory, do you have use LWP; and use WWW::Mechanize; 'outside' your while loop (like at the beginning of your program)? This will enable them to share those modules.
      i'm on os-x and hardly know how to use top on this craptop. I checked my memory usage through activity monitor. The root says 29mb and everything else (upto 20 forks or so) say 6.5-7.5 mb real memory + 73mb virtual. So should i not bother with threads and simply optimize this forked code?
Re: converting from process to threads
by BrowserUk (Patriarch) on Jan 20, 2008 at 09:45 UTC

    Try this and see how you get on?

    On my system with only dialup bandwidth I used HEAD requests and parsed local content (32kb HTML with ~1000 links). The result is that with 10 threads, it consumes ~40MB of ram and achieves >80% cpu usage. Using more threads is pointless, even fetching only head requests. That pretty much saturates my dialup connection.

    YMMV depending on the size of the downloads, complexity of the processing and what other modules you need to load. Along with the number of cpus and your bandwidth. Feedback appreciated.

    #! perl -slw use strict; use threads; use threads::shared; $|++; ## Important to prevent IO overlap our $NTHREADS ||= 10; ## Read test content from DATA 32kb html containing ~ 1000 links, ## Used for testing in conjunction with head() below. # my $content; { local $/; $content = <DATA>; } my $osync :shared; my $isync :shared; sub processEm { require LWP::Simple; LWP::Simple->import( 'head', 'get' ); require HTML::LinkExtor; my $tid = threads->self->tid; warn "$tid: starting"; while( my $url = do{ lock $isync; <STDIN> } ) { chomp $url; warn "$tid: processing $url"; ## Used for testing. A workaround my bandwidth limits ## and being a good netizen # head( $url ) or warn "Couldn't fetch $url" and next; my $content = get( $url ) or warn "Couldn't fetch $url" and ne +xt; my $l = HTML::LinkExtor->new( sub{ lock $osync; print "'$_[ 2 ]'"; } , $url ); $l->parse( $content ); } } open STDIN, '<', $ARGV[ 0 ] or die $!; my @threads = map{ threads->create( \&processEm ) } 1 .. $NTHREADS; $_->join for @threads; __END__ c:\test>663223.plt -NTHREADS=10 urls.txt >output.txt

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: converting from process to threads
by kirillm (Friar) on Jan 21, 2008 at 08:28 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://663223]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (4)
As of 2024-03-29 11:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found