Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Avoiding data copying in threads

by ganeshPerlStarter (Novice)
on Jul 01, 2014 at 07:20 UTC ( [id://1091804]=perlquestion: print w/replies, xml ) Need Help??

ganeshPerlStarter has asked for the wisdom of the Perl Monks concerning the following question:

Hi PerlMonk friends, I using Strawberry Perl-v5.18.1(MSWin32-x64-multi-thread) on Windows 7 (64bit) for processing some HTML webpages and getting files using FTP and then processing those files. As these are distinct, sequential activities to be done in this order, I am using 3 threads: 1 for getting & parsing webpage to get list of files, 2nd thread to ftp-get these files (using Perl::FTP) and then 3rd thread to process the downloaded files. I am creating all these threads in the same perl script which has other functions also. Thus, according to the behaviour mentioned here "http://www.perlmonks.org/?node_id=288022", I am seeing that my this perl process memory is growing to 190 MB size. However, if I don't do this with threads, then process memory is close to 50 MB. I would like to avoid this data copying when the threads are getting created, as it is adding to CPU usage also. I do not want to use BEGIN, as I want to use command line parameters to decide thread parameters. Can you please guide me, If we put each thread in its on package & .pm file, can we avoid or reduce this data copying effect? What are other possible ways to reduce this to some extent? Thanks for your time and guidance. Regards

Replies are listed 'Best First'.
Re: Avoiding data copying in threads
by x-lours (Sexton) on Jul 01, 2014 at 08:40 UTC
    have you read about Thread::Semaphore ? it could be a solution.
      Hi x-lours, Thanks for this suggestion. However, isn't the use semaphore is more towards synchronization of threads (or processes)? I am using Thread::Queue to sync and communicate between threads. And, so synchronization and communication is not an issue. My main worry is about this behavior referred on this page: "It means that every time you start a thread all data structures are copied to the new thread. And when I say all, I mean all. This e.g. includes package stashes, global variables, lexicals in scope. Everything!" Thanks

        When you 'create' a thread - at the point at which you 'create' - the process state is copied.

        That can get quite expensive on memory overhead, yes. That is the way of things if you want to use Perl threads. You can mitigate it a little, by limiting memory footprint of your originator, and 'create' before you start filling memory with stuff you don't need in every thread.

        If you're on a system that supports it natively (e.g. most Unixes) then fork() may be an alternative - as most kernels implement copy-on-write for forked processes, meaning that they're very memory efficient. However you lose some of the conveniences that you'd get from e.g. threading.

Re: Avoiding data copying in threads
by Anonymous Monk on Jul 01, 2014 at 16:32 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1091804]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-04-23 11:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found