Re: Multi threading
by Corion (Patriarch) on Apr 06, 2009 at 06:46 UTC
|
First, get your nomenclature clean. You shouldn't mix threads and forks, and they are not the same.
If you are using threads, I recommend you use a Threads::Queue to serialize access to the log, and one log writer thread that reads from the queue and writes to the log file. This is the easiest way.
Another way would be to hope that your operating system has atomic writes and that your lines for the log file are shorter than 512 bytes or whatever the write buffer limit of your OS is. Then you shouldn't need to worry about threads mixing their write buffers as long as you unbuffer the filehandle.
| [reply] |
|
I am using this code to create the threads using fork.
This will read all the files in a directory and writes to a log file.
Now this code creates th elog file. But some of the files are missed due to overlapping.
How can I avoid this using locks
#!/usr/bin/perl
use Parallel::ForkManager;
my $processor = shift;
$tc=5; # threads
$fc = 100; # splits.. each thread should process 100 at a time
my $pm = new Parallel::ForkManager($tc+1);
$pm->run_on_finish( sub { my ($pid, $exit_code, $ident) = @_; $tmp
+Files[$ident] = undef; } );
foreach my $i (0..$#tmpFiles) {
# Forks and returns the pid for the child:
my $pid = $pm->start($i) and next;
$SIG{INT} = 'DEFAULT';
my $filename = $tmpFiles[$i]->filename();
my $file = IO::File->new("<$filename") or die "Can't open $filen
+ame\n";
while((my $line) = $file->getline()) {
last unless defined($line);
chomp $line;
my ($dir, $file) = split(/\t/, $line);
$processor->($dir, $file, $config, $log);
}
$pm->finish; # Terminates the child process
}
| [reply] [d/l] |
|
As Corion said, you create processes with fork(), not threads. If you have multiple writers to a file then there are several solutions - none of them particularly good! The simplest is to lock the whole file for each writer - but that defeats the object of having multiple writers. Another is to allocate specific regions (by byte offset) of the file for each thread/process ensuring that these do not overlap. That avoids locking, but requires some planning and management. Use seek to position each thread/process. If you turn off buffering for the write you will reduce the overlap, but probably not get rid of it altogether.
| [reply] |
|
|
|
Can you please tell me how can I create a separate log files for each process.
Not all process writing to a single file?
| [reply] |
Re: Multi threading
by BrowserUk (Patriarch) on Apr 06, 2009 at 10:13 UTC
|
If you want to be able to serialise writes to a single log file concurrently from multiple threads of execution, then a locking a simple shared variable will achieve that (using threads & threads::shared):
#! perl -slw
use strict;
use threads;
use threads::shared;
our $N ||= 100;
sub worker {
my $tid = threads->tid;
my( $log, $semRef, $from, $to ) = @_;
for my $file ( $from .. $to ) {
## Simulate doing some processing
sleep 1+rand( 2 );
## Lock the log file semaphore before writing
lock $$semRef;
## And write to the log
printf $log "[%2d] Processesing file%3d\n",
$tid, $file;
## The lock is released automatically at the end of the block
}
}
## A shared variable used as a semaphore for the log file resource
my $logSem :shared;
## Open the log file in the main thread
open my $log, '>', 'myLog' or die $!;
my @threads = map{
## create the workers passing the log file handle and semaphore
threads->create( \&worker, $log, \$logSem, $_*$N, $_*$N + $N -1 );
} 0 .. 4; ## 5 threads each processing 100 "files"
## Wait till they are done
$_->join for @threads;
## close the log
close $log;
That's just a simplistic demo of the technique. If it is of interest, and you need help adapting it to your needs, please describe those needs more clearly.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
Hi,
Thanks for the reply.
When I run this script initializing
$from = 0;
$to = 1000;
Only 500 entries are got in the file and the it prints from
<code>
[ 3] Processesing file200
[ 3] Processesing file201
[ 3] Processesing file202
....
[ 3] Processesing file220
[ 3] Processesing file299
[ 5] Processesing file400
[ 5] Processesing file401
[ 5] Processesing file498
[ 5] Processesing file499
[ 1] Processesing file 0
[ 1] Processesing file 1
[ 1] Processesing file 2
..
[ 1] Processesing file 99
[ 2] Processesing file100
[ 2] Processesing file101
[ 2] Processesing file102
[ 2] Processesing file103
[ 2] Processesing file104
[ 2] Processesing file105
..
[ 2] Processesing file198
[ 2] Processesing file199
[ 4] Processesing file300
[ 4] Processesing file301
......
[ 4] Processesing file397
[ 4] Processesing file398
[ 4] Processesing file399
Only 500 entries are there in the file and the processing file is not in sequence order.
Can you please help me how to do it.
How to use file locks on the code above which I have mentioned in the previous thread, i.e forking a process and all the files should write to a single log file. | [reply] [d/l] |
|
| [reply] |
Re: Multi threading
by Anonymous Monk on Apr 07, 2009 at 07:29 UTC
|
I urge you to reconsider your design. It would be much simpler if you had a designated process whose sole job is to write to the log file - a logger in the style of syslogd. When one of the worker threads/procs wants to write to the log file then it sends a request to the logger. There you have a number of mechanisms you could use for inter-process communications, sockets, message queues, or even a named pipe, provided the record is not too large.
| [reply] |
|
Hi,
Suppose I am forking 5 process, All the five process should should write the 5 different log file.
Can you please tell me how can I do it?
| [reply] |
|
There is more than 10000 articles in a directory. Using this code I am creating the 5 processes.
Each process should read 100 articles at once and write to the log file.
While writing to log file some of the files are missed.
How to use the lock on these process?
How can I create the different lof files for all 5 processes.
Can anyone please help me?
my $pm = new Parallel::ForkManager(5);
$pm->run_on_finish( sub { my ($pid, $exit_code, $ident) = @_; $tmp
+Files[$ident] = undef; } );
foreach my $i (0..$#tmpFiles) {
# Forks and returns the pid for the child:
my $pid = $pm->start($i) and next;
$SIG{INT} = 'DEFAULT';
my $filename = $tmpFiles[$i]->filename();
my $file = IO::File->new("<$filename") or die "Can't open $filen
+ame\n";
while((my $line) = $file->getline()) {
last unless defined($line);
chomp $line;
my ($dir, $file) = split(/\t/, $line);
$processor->($dir, $file, $config, $log);
}
$pm->finish; # Terminates the child process
}
$pm->wait_all_children;
| [reply] [d/l] |
|
|