Avoiding data copying in threads

ganeshPerlStarter has asked for the wisdom of the Perl Monks concerning the following question:

Hi PerlMonk friends, I using Strawberry Perl-v5.18.1(MSWin32-x64-multi-thread) on Windows 7 (64bit) for processing some HTML webpages and getting files using FTP and then processing those files. As these are distinct, sequential activities to be done in this order, I am using 3 threads: 1 for getting & parsing webpage to get list of files, 2nd thread to ftp-get these files (using Perl::FTP) and then 3rd thread to process the downloaded files. I am creating all these threads in the same perl script which has other functions also. Thus, according to the behaviour mentioned here "http://www.perlmonks.org/?node_id=288022", I am seeing that my this perl process memory is growing to 190 MB size. However, if I don't do this with threads, then process memory is close to 50 MB. I would like to avoid this data copying when the threads are getting created, as it is adding to CPU usage also. I do not want to use BEGIN, as I want to use command line parameters to decide thread parameters. Can you please guide me, If we put each thread in its on package & .pm file, can we avoid or reduce this data copying effect? What are other possible ways to reduce this to some extent? Thanks for your time and guidance. Regards

Comment on Avoiding data copying in threads

Replies are listed 'Best First'.
Re: Avoiding data copying in threads by x-lours (Sexton) on Jul 01, 2014 at 08:40 UTC
have you read about Thread::Semaphore ? it could be a solution.	[reply]
Re^2: Avoiding data copying in threads by ganeshPerlStarter (Novice) on Jul 01, 2014 at 08:59 UTC
Hi x-lours, Thanks for this suggestion. However, isn't the use semaphore is more towards synchronization of threads (or processes)? I am using Thread::Queue to sync and communicate between threads. And, so synchronization and communication is not an issue. My main worry is about this behavior referred on this page: "It means that every time you start a thread all data structures are copied to the new thread. And when I say all, I mean all. This e.g. includes package stashes, global variables, lexicals in scope. Everything!" Thanks	[reply]
Re^3: Avoiding data copying in threads by Preceptor (Deacon) on Jul 01, 2014 at 09:37 UTC
When you 'create' a thread - at the point at which you 'create' - the process state is copied. That can get quite expensive on memory overhead, yes. That is the way of things if you want to use Perl threads. You can mitigate it a little, by limiting memory footprint of your originator, and 'create' before you start filling memory with stuff you don't need in every thread. If you're on a system that supports it natively (e.g. most Unixes) then fork() may be an alternative - as most kernels implement copy-on-write for forked processes, meaning that they're very memory efficient. However you lose some of the conveniences that you'd get from e.g. threading.	[reply]
Re: Avoiding data copying in threads by Anonymous Monk on Jul 01, 2014 at 16:32 UTC
can we avoid or reduce this data copying effect? Sure see how its done	[reply]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks