http://qs321.pair.com?node_id=381156

Asgaroth has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I would appreciate some of the infinite wisdom that is at the Monestry Gates.

Firstly, is it at all possible to use a combination of the Thread::Conveyor module and the Parallel:ForkManager module? I am assuming that this is indeed possible.

Secondly, how would I use this in a thread "consumer" subroutine?

Basically I have a thread that is constantly being populated with locations of filenames. These files then need to be compressed. The thread "consumer" is working just fine, it compresses all these files, however, I need to perform multiple compressions at once, i am hoping to achieve this by implementing the Parallel::ForkManager module within the "consumer" thread.

The code for the subrouting follows:

sub compress_logs() { $compress_process_manager = new Parallel::ForkManager(4) ; while( my $filename = $archive_queue->take ) { if ( defined ($filename) ) { my $pid = $compress_process_manager->start and next ; next unless -e $filename; next unless -e $filename . ".sum"; next if -e $filename . ".gz"; $message_queue->put(RUN_LOG, "Recieved $filename For Compr +ession"); $message_queue->put(RUN_LOG, "Begining Compression Of $fil +ename"); $message_queue->put(RUN_LOG, "Reading $filename Into Memor +y"); my $string = ''; open(FH, "<$filename") or die "Could not open $filename ($ +{OS_ERROR})"; binmode(FH); while(<FH>) { $string .= ${ARG} }; close(FH); $message_queue->put(RUN_LOG, "Completed Reading $filename +Into Memory"); $message_queue->put(RUN_LOG, "Compressing $filename Memory + Image"); my $dest = Compress::Zlib::memGzip($string) ; $message_queue->put(RUN_LOG, "Completed Compressing $filen +ame Memory Image"); $message_queue->put(RUN_LOG, "Flushing $filename Memory Im +age To Disk"); open(FH, ">$filename.gz") or die "Could not open $filename +.gz (${OS_ERROR})"; binmode(FH); print FH $dest; close(FH); undef($string); $message_queue->put(RUN_LOG, "Completed Flushing $filename + Memory Image To Disk"); $message_queue->put(RUN_LOG, "Removing $filename After Com +pression"); unlink($filename); $message_queue->put(RUN_LOG, "Completed Removing $filename + After Compression"); $message_queue->put(RUN_LOG, "Removing $filename.sum After + Compression"); unlink($filename . ".sum"); $message_queue->put(RUN_LOG, "Completed Removing $filename +.sum After Compression"); $message_queue->put(RUN_LOG, "Completed Compression Of $fi +lename"); $compress_process_manager->finish ; }; }; };


If I comment out all the Parallel::ForkManager related statements, this subroutine works as it should, if I have them in as posted above then it does not appear to compress anything.

Am I missing something obvious here, or is there some fundamental design flaw in the above code?

Dont worry about the reading od the datafiles into memory then flushing to disk, the system that this is running on is more than capable of handling the memory requirements. However, suggestions would be appreciated on how to improve the above subroutine. Most of it is just queueing to another thread which writes logs to a file.

Your help in this would be *greatly* appreciated.

Thanks
Asgaroth

Replies are listed 'Best First'.
Re: Using Thread::Conveyor and Parallel::ForkManager
by BrowserUk (Patriarch) on Aug 09, 2004 at 08:16 UTC

    Mixing threads and forks in one perl program, especially forking from within threads is asking for trouble.

    I won't go as far as saying it couldn't be made to work, but be prepared to be the pioneer. I doubt there are many others who have gone this route around to help you. Personally, I can see no logic at all in doing so, and I can see innumerable reasons why you should not.

    If you are already using threads, and need further concurrency, why not just use another thread? What does adding forking into the mix buy you?


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
      Do you mean that I can use more "consumer" subroutine to process the same queue?

      For example, if I have this definition, are you suggesting it would be a better solution:

      my $archive_queue = Thread::Conveyor -> new ( { maxboxes => 5000, minboxes => 4000, optimize => 'memory', } ); my $compress_log_thread_1 = new threads (\&compress_logs); my $compress_log_thread_2 = new threads (\&compress_logs); my $compress_log_thread_3 = new threads (\&compress_logs); my $compress_log_thread_4 = new threads (\&compress_logs);


      Thanks Asgaroth

        I'm only saying that mixing threads and forks is, at best going to be messy and extremely difficult to debug; at worst will never work.

        I cannot suggest a better solution to your problem without a clear description of the problem.

        So far, I do not understand what you are trying to achieve by using Thread::Conveyor? The example code you have offered is nothing more than the code from the Synopsis of that module with a couple of constants changed to big numbers.

        Personally I am not sure what problem Thread::Conveyor is designed to solve. As such I have never found a use the module. I'd also be wary of using it as (as far as I'm aware) it's author is no longer developing it.

        I am slowly building a library of good uses for threads, and as such I am always willing to try and come up with working solutions to problems using them. In order to do this, it requires a description of the underlying problem devoid of pre-conceptions of the best way to solve it.

        In this case, you appear to wanting to run the compression of several log files in parellel. My problem is, I do not see how Thread::Conveyor helps in solving this? I'm also not sure that there will be much in the way of performance advantage in running this type of process in parallel unless you have multiple cpus, but until I try it, I may be completely wrong on that.

        Summary: Re-state the question without the preconseptions of how to solve it and I will willing take a crack at it:


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
        It seems like you're not understanding that Parallel::ForkManager does not use threads. It is process-based. Since Threads::Conveyor uses threads, the two don't mix.
Re: Using Thread::Conveyor and Parallel::ForkManager
by pearlie (Sexton) on Aug 09, 2004 at 08:14 UTC
    Try using the statement:
    $compress_process_manager->wait_all_children;
    after the while loop closing brace }. But remember, this is a blocking wait. Hope it works.