Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

File::Temp randomness when forking

by ryantate (Friar)
on Nov 29, 2005 at 06:13 UTC ( [id://512530]=perlquestion: print w/replies, xml ) Need Help??

ryantate has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to use File::Temp to write out one file from each of 10-15 processes I've forked off. It seems that when I have more than 10 processes forked, I start getting error messages starting at temp file 11 about having to try too many times to generate a random filename.

The error looks like:

Error in tempfile() using /tmp/1yUjjuyGJn/sappyXXXXXXXX: Have exceeded the maximum number of attempts (10) to open temp file/dir at /home/sappy/dev/pfork.pl line 19

Apparently, File::Temp will only try 10 times to come up with a unique random filename, then gives up. When one forks off processes, File::Temp somehow comes up with the same "random" filenames for each process. Is this expected behavior, given how fork works, or is this a bug in File::Temp?

FWIW, I am on File::Temp 0.14, and I noticed the following in the ChangeLog entry for 0.15:

* Temp.pm: Increase maximum number of tries before aborting.
MAX_TRIES has been taken from 10 to 1000! Is this the only way to get around this problem, just have the module keep trying, or is there a more elegant solution?

I have a workaround, which is to include $$, the PID, in the template I pass to File::Temp for generating filenames. But I think this should be at least noted in the File::Temp documentation (which even has a section on forking).

Example code giving errors (derived, by the way, from code found in Parallel::ForkManager docs and/or Perlmonks):

use strict; use warnings; use Parallel::ForkManager; use HTTP::GHTTP; use Time::HiRes qw[ time ]; use File::Temp qw(tempdir tempfile); use File::Path; my $start = time; my $pm=new Parallel::ForkManager(15); my $temp_dir = tempdir(); for my $link (map { chomp; $_ } <DATA>) { $pm->start and next; my $getter = HTTP::GHTTP->new; $getter->set_uri("http://$link/"); $getter->process_request; my $page = $getter->get_body; my $fh = File::Temp->new(TEMPLATE => "sappyXXXXXXXX", DIR => $temp_dir, UNLINK => 0) or die "Could not make tempfil +e: $!"; print $fh $page or die "Could not print to tempfile: $!"; close $fh or die "Could not close tempfile: $!"; print "$link downloaded.\n"; $pm->finish; } $pm->wait_all_children; #rmtree([$temp_dir]); print "Removed temp dir '$temp_dir'\n"; print 'Done in: ', time - $start, ' seconds.'; __DATA__ www.google.com www.yahoo.com www.amazon.com www.ebay.com www.perlmonks.com news.yahoo.com news.google.com www.msn.com www.slashdot.org www.indymedia.org www.sfgate.com www.nytimes.com www.cnn.com

Output, including errors:

blah@blah [534] perl -wT /home/sappy/dev/pfork.pl www.amazon.com downloaded. www.yahoo.com downloaded. news.google.com downloaded. www.google.com downloaded. news.yahoo.com downloaded. www.slashdot.org downloaded. www.indymedia.org downloaded. www.ebay.com downloaded. www.cnn.com downloaded. www.sfgate.com downloaded. Error in tempfile() using /tmp/1yUjjuyGJn/sappyXXXXXXXX: Have exceeded + the maximum number of attempts (10) to open temp file/dir at /home/s +appy/dev/pfork.pl line 19 Error in tempfile() using /tmp/1yUjjuyGJn/sappyXXXXXXXX: Have exceeded + the maximum number of attempts (10) to open temp file/dir at /home/s +appy/dev/pfork.pl line 19

To fix code, I change one line as such:

my $fh = File::Temp->new(TEMPLATE => "sappy" . $$ . "XXXXXXXX", DIR => $temp_dir, UNLINK => 0) or die "Could not make tempfil +e: $!";

Replies are listed 'Best First'.
Re: File::Temp randomness when forking
by tirwhan (Abbot) on Nov 29, 2005 at 11:16 UTC

    The error occurs because you are calling tempdir before your loop. This makes File::Temp call srand internally, and all subsequent forks inherit the seeded random value.You can 'fix' this behaviour by explicitly calling srand($$) before your call to File::Temp->new() in the loop. You're probably right that it would be a good idea for the documentation to point out this particular quirk (or the module could work around it by comparing $$ between rand calls calling srand explicitly), I suggest you file a bug report at rt.cpan.org.

    As an aside, you would probably have gotten a quicker response or even have found out the reason for yourself if the code you posted had been reduced to the bits necessary to demonstrate this bug. This would have sufficed (with the fix commented out):

    use strict; use warnings; use Parallel::ForkManager; use File::Temp qw(tempdir tempfile); my $pm=new Parallel::ForkManager(15); my $tempdir = tempdir(); for my $i (1..15) { $pm->start and next; # srand($$); my $fh = File::Temp->new(TEMPLATE => "sappyXXXXXXXX", UNLINK => 0) or die "Could not make tem +pfile: $!"; close $fh or die "Could not close tempfile: $!"; $pm->finish(); } $pm->wait_all_children;

    Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan
      The error occurs because you are calling tempdir before your loop. This makes File::Temp call srand internally, and all subsequent forks inherit the seeded random value.

      Thanks for this! I did not know this is how things work, didn't bother looking through the source for more than a few minutes.

      FWIW, I call tempdir first because I want all the forks to dump their files into the same dir so I can collect their output easily in the parent when they finish. (I would think this would be a common pattern, but maybe not.)

      This is why I included slightly more source than you personally would prefer -- I wanted to show a bit of why I am doing this the way I am doing it. It is reduced considerably from the actual code (down to 26 lines not counting __DATA__ vs. 13 lines for yours). On further consideration, I would probably cut it down as you suggest, and deal with any "why did you do this" questions as they come up (instead of preempting them).

      In any case, I doubt I would have stumbled on anything involving srand no matter how much I cut down my code, as it was never in my code! I only know about it because you pointed it out to me, so thanks.

      RT

        That's not necessarily what I meant. If you'd have cut down your code to the essentials you would have removed the tempdir call before the loop and suddenly it would not have failed. You'd then at least have this extra bit of information which you could have checked up on yourself in the File::Temp code or added to the post. And it would have been easier for someone to look at the problem and figure out what the problem was without the irrelevant bits of information. As I said, just an aside for future reference, sorry if it maybe came across a little harsh.


        Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan
Re: File::Temp randomness when forking
by Moron (Curate) on Nov 29, 2005 at 10:11 UTC
    As a general rule, modules should not be automatically employed just because they are there. (Update: In this case systems are usually just too individual - they have different process behaviour, disk space issues, support requirements etc.) If a task is simple enough, it is usually less work to do it yourself. In every system I ever worked on, unique temporary filenames were always generated using site-written rather than CPAN modules. In the current case we are satisfied to construct unique temporary filenames from a functional identifier plus the date and time plus the pid ($$ in perl) plus a file type with a dot separator between these elements, e.g.

    NamedProcess.20051129.110401.1234.ext

    Millions of systems the world over have used this type of method (first mainly in C and now also in Perl) for decades and it can safely be called a de facto standard to do so. By comparison, using a CPAN module makes your system less supportable and maintainable. Note also that most systems will need their own date and time formatting to match the above formats anyway and this reduces the unique tmpfile subroutine to a trivial one-liner:

    return join( '.', shift(), ProjectDate(), ProjectTime(), $$, shift +() );

    -M

    Free your mind

      Your method will fail to generate a unique name if you are creating more than one file per process per second (which can happen easy enough), or even if you're on a system that does not handle PIDs sanely. File::Temp exists exactly because it is very easy to fall into this kind of trap and surprisingly hard to avoid all traps that come with temporary file generation. "Millions of systems the world over" do this badly and sometimes fail because their author did not consider all corner cases (especially since such errors tend to show up only on loaded production systems and not during testing). Using File::Temp for temporary files is definitely a best practice in my book and there would have to be very specific reasons for recommending against it.


      Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan
        File::Temp exists exactly because it is ... surprisingly hard to avoid all traps that come with temporary file generation.

        Thanks, this was my thinking precisely. I have some code I use to generate unique IDs using $$ and various other vars, which would have worked fine, but I figured it was about time I learn File::Temp. Also, I'm lazy ;->

      A reply falls below the community's threshold of quality. You may see it by logging in.
      You're probably right. Re-rolling a solution is probably a good idea. Especially since the module is a core module...that probably means it doesn't work that well. By your reasoning, we should eschew any module that we didn't write ourselves. Best of luck in that endeavor.

      thor

      The only easy day was yesterday </tt>

      A reply falls below the community's threshold of quality. You may see it by logging in.
      If a task is simple enough, it is usually less work to do it yourself.

      Sincere thanks for the perspective, but my milage varies here. I likely have less experience than you, so my default is almost always to reach for the nearest CPAN solution. It saves me huge amounts of time, and when it doesn't, I am sometimes able to send a bugfix or documentation patch upstream to improve the module.

      In the vast majority of cases, using CPAN is a win, at least for me.

      A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://512530]
Approved by sk
Front-paged by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2024-03-28 14:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found