Can't write to open writeable Filehandle

Ellhar has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, Not the strongest Perl guy but I do love the way it whips through files.
I have a script that parses a list of files I want to load into a database. Input File is opened for reading, Output file opened for writing. Database friendly Headers are put in the output file and the contents of the input file are essentially copied to the output file. Seems pretty pointless I know. However, some of the files contain literally millions of lines and so determining the length of string fields I need to declare for each text field in my database is quite a job. So the script also checks each field of each line and stores the longest length it finds in an array. I then want to print that information eg MAX(fieldlength) for each file in a summary file so I can quickly determine the correct size of string field to declare in my database. My question is can anyone see why despite the summary file being created and writeable throughout the execution of the script it remains empty. I'm sure this is something really obvious but all my filehandle research has proved fruitless.
Thanks in advance Ell

#! /usr/bin/perl -w
# Script to Create Database files

use strict;


my $output;
my $infile;
my $summary;
my $input = "C:\\Elliott\\Database\\Repository\\";
my $line;
my $wait;
my $linecount = 0;
my @fieldsize;
my @temparray;
my $i;
my @inputfiles = ("gene2accession", "gene2go","gene2sts","gene2unigene
+","gene2pubmed","gene2refseq","gene_history","gene_info","gene_refseq
+_uniprotkb_collab","generifs_basic","hiv_interactions","interactions"
+);
$summary = $input."myentrezgenefilesummary.txt";

open (SUMMARY,"> $summary") or die "Cannot open $summary: $!";
print SUMMARY "This file lists the processed files their field and max
+imum field size\n";
print SUMMARY "This data can be used to determine the varchar field si
+zes in the novel therapies SQL database\n";

foreach (@inputfiles) {
    $linecount = 0;
    
    #get input and output files and open for reading/writing
    $infile = $input.$_;
    $output = $input.$_.".txt";

        open (INFILE, "< $infile") or die "Cannot open $input: $!";
        open (OUTFILE,"> $output") or die "Cannot open $output: $!";
        print "FILE ", $infile, " OPEN", "\n";
        while ($line = <INFILE>) {
            chomp $line;
            if ($linecount == 0) {
                #if first line print field names
                #print "SUMMARY is".(is_writable_fh(\*SUMMARY)?"":"n't
+")." writable.\n";
                if ($_ =~ /gene2accession/) {
                    print OUTFILE "Taxon\tGeneID\tStatus\tRNA_Nucleoti
+de_Accession\tRNA_Nucleotide_gi\tProtein_Accession\tProtein_gi\tGenom
+ic_Nucleotide_Accession\tGenomic_Nucleotide_gi\tGenomic_Accession_Sta
+rt_Pos\tGenomic_Accession_End_Pos\tOrientation\tAssembly\n";
                    print SUMMARY "Field lengths for file $_\n";
                    print SUMMARY "Taxon\tGeneID\tStatus\tRNA_Nucleoti
+de_Accession\tRNA_Nucleotide_gi\tProtein_Accession\tProtein_gi\tGenom
+ic_Nucleotide_Accession\tGenomic_Nucleotide_gi\tGenomic_Accession_Sta
+rt_Pos\tGenomic_Accession_End_Pos\tOrientation\tAssembly\n";
                }
                elsif ($_ =~ /gene2go/){
                    print OUTFILE "Taxon\tGeneID\tGO_ID\tEvidence\tQua
+lifier\tGO_term\tPubMedID\tCategory\n";
                    print SUMMARY "Field lengths for file $_\n";
                    print SUMMARY "Taxon\tGeneID\tGO_ID\tEvidence\tQua
+lifier\tGO_term\tPubMedID\tCategory\n";
                }
                elsif ($_ =~ /gene2pubmed/){
                    print OUTFILE "Taxon\tGeneID\tPubMedID\n";
                    print SUMMARY "Field lengths for file $_\n";
                    print SUMMARY "Taxon\tGeneID\tPubMedID\n";
                }
                elsif ($_ =~ /gene2refseq/){
                    print OUTFILE "Taxon\tGeneID\tStatus\tRNA_Nucleoti
+de_Accession\tRNA_Nucleotide_gi\tProtein_Accession\tProtein_gi\tGenom
+ic_Nucleotide_Accession\tGenomic_Nucleotide_gi\tGenomic_Accession_Sta
+rt_Pos\tGenomic_Accession_End_Pos\tOrientation\tAssembly\n";
                    print SUMMARY "Field lengths for file $_\n";
                    print SUMMARY "Taxon\tGeneID\tStatus\tRNA_Nucleoti
+de_Accession\tRNA_Nucleotide_gi\tProtein_Accession\tProtein_gi\tGenom
+ic_Nucleotide_Accession\tGenomic_Nucleotide_gi\tGenomic_Accession_Sta
+rt_Pos\tGenomic_Accession_End_Pos\tOrientation\tAssembly\n";
                }
                elsif ($_ =~ /gene2sts/){
                    print OUTFILE "GeneID\tUniSTSID\n";
                    print SUMMARY "Field lengths for file $_\n";
                    print SUMMARY "GeneID\tUniSTSID\n";
                }
                elsif ($_ =~ /gene2unigene/){
                    print OUTFILE "GeneID\tUnigeneUD\n";
                    print SUMMARY "Field lengths for file $_\n";
                    print SUMMARY "GeneID\tUnigeneUD\n";
                }
                elsif ($_ =~ /gene_history/){
                    print OUTFILE "Taxon\tGeneID\tDiscontinued_GeneID\
+tDiscontinued_Symbol\n";
                    print SUMMARY "Field lengths for file $_\n";
                    print SUMMARY "Taxon\tGeneID\tDiscontinued_GeneID\
+tDiscontinued_Symbol\n";
                }
                elsif ($_ =~ /gene_info/){
                    print OUTFILE "Taxon\tGeneID\tSymbol\tLocusTag\tSy
+nonyms\tdbXrefs\tChromosome\tMap_Location\tDescription\tType_Of_Gene\
+tSymbol_From_Nomenclature_Authority\tFull_Name_From_Nomenclature_Auth
+ority\tNomenclature_Status\tOther_Designations\tModification_Date\n";
                    print SUMMARY "Field lengths for file $_\n";
                    print SUMMARY "Taxon\tGeneID\tSymbol\tLocusTag\tSy
+nonyms\tdbXrefs\tChromosome\tMap_Location\tDescription\tType_Of_Gene\
+tSymbol_From_Nomenclature_Authority\tFull_Name_From_Nomenclature_Auth
+ority\tNomenclature_Status\tOther_Designations\tModification_Date\n";
                }
                elsif ($_ =~ /gene_refseq_uniprotkb_collab/){
                    print OUTFILE "NCBI_Protein_Accession\tUniProtKB_P
+rotein_Accession\n";
                    print SUMMARY "Field lengths for file $_\n";
                    print SUMMARY "NCBI_Protein_Accession\tUniProtKB_P
+rotein_Accession\n";
                }
                elsif ($_ =~ /generifs_basic/){
                    print OUTFILE "Taxon\tGeneID\tPubMedID\tLastUpdate
+\tGeneRIFText\n";
                    print OUTFILE "$line\n";
                    # no header row in file get field for comparison
                    @fieldsize = split(/\t/, $line);
                    print SUMMARY "Field lengths for file $_\n";
                    print SUMMARY "Taxon\tGeneID\tPubMedID\tLastUpdate
+\tGeneRIFText\n";
                }
                elsif ($_ =~ /hiv_interactions/){
                    print OUTFILE "Taxon\tGeneID\tProductAccession\tPr
+oductName\tInteractionShortName\tInteractorTaxon\tInteractorGeneID\tI
+nteractorProdictAccession\tInteractorProductName\tPubMedID\tLastUpdat
+e\tGeneRIFText\n";                
                    print OUTFILE "$line\n";
                    # no header row in file get field for comparison
                    @fieldsize = split(/\t/, $line);
                    print SUMMARY "Field lengths for file $_\n";
                    print SUMMARY "Taxon\tGeneID\tProductAccession\tPr
+oductName\tInteractionShortName\tInteractorTaxon\tInteractorGeneID\tI
+nteractorProdictAccession\tInteractorProductName\tPubMedID\tLastUpdat
+e\tGeneRIFText\n";
                }
                elsif ($_ =~ /interactions/){
                    print OUTFILE "Taxon\tGeneID\tProteinAccession\tGe
+neName\tKeyPhrase\tInteractorTaxon\tInteractorGeneID\tInteractionType
+\tInteractorProductAccession\tInteractorProductName\tComplexID\tCompl
+exIDType\tComplexName\tPubMedID\tLastUpdate\tGeneRIFText\tInteraction
+ID\tInteractionIDType\n";
                    print SUMMARY "Field lengths for file $_\n";
                    print SUMMARY "Taxon\tGeneID\tProteinAccession\tGe
+neName\tKeyPhrase\tInteractorTaxon\tInteractorGeneID\tInteractionType
+\tInteractorProductAccession\tInteractorProductName\tComplexID\tCompl
+exIDType\tComplexName\tPubMedID\tLastUpdate\tGeneRIFText\tInteraction
+ID\tInteractionIDType\n";
                }
                else {
                    print "Header line for this input file not definae
+d please contact system administator.\n";
                    exit;
                }
                $linecount = 1;
            }
            else {
                #print line of data to outfile
                print OUTFILE "$line\n";
                
                #get fields to test size against existing.
                @temparray = split(/\t/, $line);
                
                # if elements in array test new values to see if large
+r than existing.
                if (@fieldsize > 0) {
                    #test the field in the line to see if larger than 
+previous and replace field size value if so.
                    for($i=0; $i<@temparray; $i++) {                
                        if (length($fieldsize[$i]) < length($temparray
+[$i])) {
                            $fieldsize[$i] = $temparray[$i];
                        }
                    }
                }
                #set fields array
                else {
                    @fieldsize = split(/\t/, $line);
                }
            }

        }        
        #print "SUMMARY is".(is_writable_fh(\*SUMMARY)?"":"n't")." wri
+table.\n";
        
        #print summary field size information to summary file
        foreach(@fieldsize) {
            print SUMMARY length($_), "\t";
        }
        print SUMMARY "\n\n";
        @fieldsize = ();
        print "FINSHED PROCESSING ", $infile, "\n";
        $linecount =0;
        close INFILE;
        close OUTFILE;
        
}
close SUMMARY;


sub is_writable_fh
{
  my($fh)=@_;
  local $\='';
  return print $fh '';
}
[download]

Comment on Can't write to open writeable Filehandle Download Code

Replies are listed 'Best First'.
Re: Can't write to open writeable Filehandle by moritz (Cardinal) on May 09, 2008 at 10:44 UTC
without looking through all of your code: not only open can fail, but also print and close (for example if the partition is full). You could check for that: `use Fatal qw(:void print close);`	[reply] [d/l]
Re: Can't write to open writeable Filehandle by roboticus (Chancellor) on May 09, 2008 at 10:55 UTC
Ellhar: Put "__END__" on a line just before your first `foreach` statement. Does it then create the file with a single line in it? If not, you need to look at permissions, quota (does Windows support quotas?), ACL, etc. I don't see anything (in a very cursory scan) in your program that might cause this behavior, which is why I suspect environmental--rather than code--issues. roboticus	[reply] [d/l]
Re^2: Can't write to open writeable Filehandle by Ellhar (Novice) on May 09, 2008 at 11:20 UTC
Hi Roboticus, Yes thoes lines get printed in the file if I add _END_. What does that mean? THanks for the quick reply EllHar	[reply]
Re^3: Can't write to open writeable Filehandle by roboticus (Chancellor) on May 09, 2008 at 11:32 UTC
Ellhar: If I had to guess, I'd suspect that your program is taking a long time to execute, and you're not letting it run to completion. Instead, you might be monitoring the file size and worry .... "Hmmph! My program isn't doing anything. Perhaps it's in an infinite loop?" and killing it. Since it hasn't written anything to the file, it remains empty. If that's the case, then you're probably suffering from buffering and should turn buffering off on the output file so that each line is written to the output file as it's printed in your program. ...roboticus Update: I guess I could've mentioned a couple of solutions... You could pepper your code with calls to flush (e.g. `SUMMARY->flush()` to write the output to disk immediately. But then you'd have to `use IO::Handle;` at the start of your program and manually insert all those calls. A better way would be to tell the filehandle to flush itself on every print/write statement. You can do this by adding `SUMMARY->autoflush();` just after your `open` statement. You'll still need the `use IO::Handle;` at the beginning though. Read `perldoc perlvar` for more details.	[reply] [d/l] [select]
Re^4: Can't write to open writeable Filehandle by Ellhar (Novice) on May 09, 2008 at 11:53 UTC
Re: Can't write to open writeable Filehandle by hipowls (Curate) on May 09, 2008 at 12:09 UTC
There is a lot of repetition, most of the if blocks do the same thing and each block repeats its list of headers. And that list of headers it's pretty hard to read. I'd suggest that you look at factoring out much of that duplication. Something like my %inputfiles = ( gene2accession => join( "\t", qw( Taxon GeneID Status RNA_Nucleotide_Accession RNA_Nucleotide_gi Protein_Accession Protein_gi Genomic_Nucleotide_Accession Genomic_Nucleotide_gi Genomic_Accession_Start_Pos Genomic_Accession_End_Pos Orientation Assembly ) ), gene2go => join( "\t", qw( Taxon GeneID GO_ID Evidence Qualifier GO_term PubMedID Category ) ), ... ); foreach my $file ( keys %inputfiles ) { ... while ( my $line = <INPUT> ) { if ( $linecount == 0 ) { print OUTFILE "$file\n"; print SUMMARY "Field lengths for file $file\n"; print SUMMARY "$inputfiles{$file}\n"; if ( $file eq 'hiv_interactions' ) { # do special processing } } } ... } [download] seems to be easier to read. It is obvious that the header is the same in both the summary and out files and that hiv_interactions gets special treatment. The headers are also easier to read. At least something to consider for your next script;)	[reply] [d/l]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks