Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Can't write to open writeable Filehandle

by Ellhar (Novice)
on May 09, 2008 at 09:59 UTC ( [id://685622]=perlquestion: print w/replies, xml ) Need Help??

Ellhar has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, Not the strongest Perl guy but I do love the way it whips through files.
I have a script that parses a list of files I want to load into a database. Input File is opened for reading, Output file opened for writing. Database friendly Headers are put in the output file and the contents of the input file are essentially copied to the output file. Seems pretty pointless I know. However, some of the files contain literally millions of lines and so determining the length of string fields I need to declare for each text field in my database is quite a job. So the script also checks each field of each line and stores the longest length it finds in an array. I then want to print that information eg MAX(fieldlength) for each file in a summary file so I can quickly determine the correct size of string field to declare in my database. My question is can anyone see why despite the summary file being created and writeable throughout the execution of the script it remains empty. I'm sure this is something really obvious but all my filehandle research has proved fruitless.
Thanks in advance Ell

#! /usr/bin/perl -w # Script to Create Database files use strict; my $output; my $infile; my $summary; my $input = "C:\\Elliott\\Database\\Repository\\"; my $line; my $wait; my $linecount = 0; my @fieldsize; my @temparray; my $i; my @inputfiles = ("gene2accession", "gene2go","gene2sts","gene2unigene +","gene2pubmed","gene2refseq","gene_history","gene_info","gene_refseq +_uniprotkb_collab","generifs_basic","hiv_interactions","interactions" +); $summary = $input."myentrezgenefilesummary.txt"; open (SUMMARY,"> $summary") or die "Cannot open $summary: $!"; print SUMMARY "This file lists the processed files their field and max +imum field size\n"; print SUMMARY "This data can be used to determine the varchar field si +zes in the novel therapies SQL database\n"; foreach (@inputfiles) { $linecount = 0; #get input and output files and open for reading/writing $infile = $input.$_; $output = $input.$_.".txt"; open (INFILE, "< $infile") or die "Cannot open $input: $!"; open (OUTFILE,"> $output") or die "Cannot open $output: $!"; print "FILE ", $infile, " OPEN", "\n"; while ($line = <INFILE>) { chomp $line; if ($linecount == 0) { #if first line print field names #print "SUMMARY is".(is_writable_fh(\*SUMMARY)?"":"n't +")." writable.\n"; if ($_ =~ /gene2accession/) { print OUTFILE "Taxon\tGeneID\tStatus\tRNA_Nucleoti +de_Accession\tRNA_Nucleotide_gi\tProtein_Accession\tProtein_gi\tGenom +ic_Nucleotide_Accession\tGenomic_Nucleotide_gi\tGenomic_Accession_Sta +rt_Pos\tGenomic_Accession_End_Pos\tOrientation\tAssembly\n"; print SUMMARY "Field lengths for file $_\n"; print SUMMARY "Taxon\tGeneID\tStatus\tRNA_Nucleoti +de_Accession\tRNA_Nucleotide_gi\tProtein_Accession\tProtein_gi\tGenom +ic_Nucleotide_Accession\tGenomic_Nucleotide_gi\tGenomic_Accession_Sta +rt_Pos\tGenomic_Accession_End_Pos\tOrientation\tAssembly\n"; } elsif ($_ =~ /gene2go/){ print OUTFILE "Taxon\tGeneID\tGO_ID\tEvidence\tQua +lifier\tGO_term\tPubMedID\tCategory\n"; print SUMMARY "Field lengths for file $_\n"; print SUMMARY "Taxon\tGeneID\tGO_ID\tEvidence\tQua +lifier\tGO_term\tPubMedID\tCategory\n"; } elsif ($_ =~ /gene2pubmed/){ print OUTFILE "Taxon\tGeneID\tPubMedID\n"; print SUMMARY "Field lengths for file $_\n"; print SUMMARY "Taxon\tGeneID\tPubMedID\n"; } elsif ($_ =~ /gene2refseq/){ print OUTFILE "Taxon\tGeneID\tStatus\tRNA_Nucleoti +de_Accession\tRNA_Nucleotide_gi\tProtein_Accession\tProtein_gi\tGenom +ic_Nucleotide_Accession\tGenomic_Nucleotide_gi\tGenomic_Accession_Sta +rt_Pos\tGenomic_Accession_End_Pos\tOrientation\tAssembly\n"; print SUMMARY "Field lengths for file $_\n"; print SUMMARY "Taxon\tGeneID\tStatus\tRNA_Nucleoti +de_Accession\tRNA_Nucleotide_gi\tProtein_Accession\tProtein_gi\tGenom +ic_Nucleotide_Accession\tGenomic_Nucleotide_gi\tGenomic_Accession_Sta +rt_Pos\tGenomic_Accession_End_Pos\tOrientation\tAssembly\n"; } elsif ($_ =~ /gene2sts/){ print OUTFILE "GeneID\tUniSTSID\n"; print SUMMARY "Field lengths for file $_\n"; print SUMMARY "GeneID\tUniSTSID\n"; } elsif ($_ =~ /gene2unigene/){ print OUTFILE "GeneID\tUnigeneUD\n"; print SUMMARY "Field lengths for file $_\n"; print SUMMARY "GeneID\tUnigeneUD\n"; } elsif ($_ =~ /gene_history/){ print OUTFILE "Taxon\tGeneID\tDiscontinued_GeneID\ +tDiscontinued_Symbol\n"; print SUMMARY "Field lengths for file $_\n"; print SUMMARY "Taxon\tGeneID\tDiscontinued_GeneID\ +tDiscontinued_Symbol\n"; } elsif ($_ =~ /gene_info/){ print OUTFILE "Taxon\tGeneID\tSymbol\tLocusTag\tSy +nonyms\tdbXrefs\tChromosome\tMap_Location\tDescription\tType_Of_Gene\ +tSymbol_From_Nomenclature_Authority\tFull_Name_From_Nomenclature_Auth +ority\tNomenclature_Status\tOther_Designations\tModification_Date\n"; print SUMMARY "Field lengths for file $_\n"; print SUMMARY "Taxon\tGeneID\tSymbol\tLocusTag\tSy +nonyms\tdbXrefs\tChromosome\tMap_Location\tDescription\tType_Of_Gene\ +tSymbol_From_Nomenclature_Authority\tFull_Name_From_Nomenclature_Auth +ority\tNomenclature_Status\tOther_Designations\tModification_Date\n"; } elsif ($_ =~ /gene_refseq_uniprotkb_collab/){ print OUTFILE "NCBI_Protein_Accession\tUniProtKB_P +rotein_Accession\n"; print SUMMARY "Field lengths for file $_\n"; print SUMMARY "NCBI_Protein_Accession\tUniProtKB_P +rotein_Accession\n"; } elsif ($_ =~ /generifs_basic/){ print OUTFILE "Taxon\tGeneID\tPubMedID\tLastUpdate +\tGeneRIFText\n"; print OUTFILE "$line\n"; # no header row in file get field for comparison @fieldsize = split(/\t/, $line); print SUMMARY "Field lengths for file $_\n"; print SUMMARY "Taxon\tGeneID\tPubMedID\tLastUpdate +\tGeneRIFText\n"; } elsif ($_ =~ /hiv_interactions/){ print OUTFILE "Taxon\tGeneID\tProductAccession\tPr +oductName\tInteractionShortName\tInteractorTaxon\tInteractorGeneID\tI +nteractorProdictAccession\tInteractorProductName\tPubMedID\tLastUpdat +e\tGeneRIFText\n"; print OUTFILE "$line\n"; # no header row in file get field for comparison @fieldsize = split(/\t/, $line); print SUMMARY "Field lengths for file $_\n"; print SUMMARY "Taxon\tGeneID\tProductAccession\tPr +oductName\tInteractionShortName\tInteractorTaxon\tInteractorGeneID\tI +nteractorProdictAccession\tInteractorProductName\tPubMedID\tLastUpdat +e\tGeneRIFText\n"; } elsif ($_ =~ /interactions/){ print OUTFILE "Taxon\tGeneID\tProteinAccession\tGe +neName\tKeyPhrase\tInteractorTaxon\tInteractorGeneID\tInteractionType +\tInteractorProductAccession\tInteractorProductName\tComplexID\tCompl +exIDType\tComplexName\tPubMedID\tLastUpdate\tGeneRIFText\tInteraction +ID\tInteractionIDType\n"; print SUMMARY "Field lengths for file $_\n"; print SUMMARY "Taxon\tGeneID\tProteinAccession\tGe +neName\tKeyPhrase\tInteractorTaxon\tInteractorGeneID\tInteractionType +\tInteractorProductAccession\tInteractorProductName\tComplexID\tCompl +exIDType\tComplexName\tPubMedID\tLastUpdate\tGeneRIFText\tInteraction +ID\tInteractionIDType\n"; } else { print "Header line for this input file not definae +d please contact system administator.\n"; exit; } $linecount = 1; } else { #print line of data to outfile print OUTFILE "$line\n"; #get fields to test size against existing. @temparray = split(/\t/, $line); # if elements in array test new values to see if large +r than existing. if (@fieldsize > 0) { #test the field in the line to see if larger than +previous and replace field size value if so. for($i=0; $i<@temparray; $i++) { if (length($fieldsize[$i]) < length($temparray +[$i])) { $fieldsize[$i] = $temparray[$i]; } } } #set fields array else { @fieldsize = split(/\t/, $line); } } } #print "SUMMARY is".(is_writable_fh(\*SUMMARY)?"":"n't")." wri +table.\n"; #print summary field size information to summary file foreach(@fieldsize) { print SUMMARY length($_), "\t"; } print SUMMARY "\n\n"; @fieldsize = (); print "FINSHED PROCESSING ", $infile, "\n"; $linecount =0; close INFILE; close OUTFILE; } close SUMMARY; sub is_writable_fh { my($fh)=@_; local $\=''; return print $fh ''; }

Replies are listed 'Best First'.
Re: Can't write to open writeable Filehandle
by moritz (Cardinal) on May 09, 2008 at 10:44 UTC
    without looking through all of your code: not only open can fail, but also print and close (for example if the partition is full).

    You could check for that: use Fatal qw(:void print close);

Re: Can't write to open writeable Filehandle
by roboticus (Chancellor) on May 09, 2008 at 10:55 UTC
    Ellhar:

    Put "__END__" on a line just before your first foreach statement. Does it then create the file with a single line in it? If not, you need to look at permissions, quota (does Windows support quotas?), ACL, etc. I don't see anything (in a very cursory scan) in your program that might cause this behavior, which is why I suspect environmental--rather than code--issues.

    roboticus
      Hi Roboticus, Yes thoes lines get printed in the file if I add _END_. What does that mean? THanks for the quick reply EllHar
        Ellhar:

        If I had to guess, I'd suspect that your program is taking a long time to execute, and you're not letting it run to completion. Instead, you might be monitoring the file size and worry .... "Hmmph! My program isn't doing anything. Perhaps it's in an infinite loop?" and killing it. Since it hasn't written anything to the file, it remains empty.

        If that's the case, then you're probably suffering from buffering and should turn buffering off on the output file so that each line is written to the output file as it's printed in your program.

        ...roboticus

        Update: I guess I could've mentioned a couple of solutions... You could pepper your code with calls to flush (e.g. SUMMARY->flush() to write the output to disk immediately. But then you'd have to use IO::Handle; at the start of your program and manually insert all those calls. A better way would be to tell the filehandle to flush itself on every print/write statement. You can do this by adding SUMMARY->autoflush(); just after your open statement. You'll still need the use IO::Handle; at the beginning though. Read perldoc perlvar for more details.

Re: Can't write to open writeable Filehandle
by hipowls (Curate) on May 09, 2008 at 12:09 UTC

    There is a lot of repetition, most of the if blocks do the same thing and each block repeats its list of headers. And that list of headers it's pretty hard to read. I'd suggest that you look at factoring out much of that duplication. Something like

    my %inputfiles = ( gene2accession => join( "\t", qw( Taxon GeneID Status RNA_Nucleotide_Accession RNA_Nucleotide_gi Protein_Accession Protein_gi Genomic_Nucleotide_Accession Genomic_Nucleotide_gi Genomic_Accession_Start_Pos Genomic_Accession_End_Pos Orientation Assembly ) ), gene2go => join( "\t", qw( Taxon GeneID GO_ID Evidence Qualifier GO_term PubMedID Category ) ), ... ); foreach my $file ( keys %inputfiles ) { ... while ( my $line = <INPUT> ) { if ( $linecount == 0 ) { print OUTFILE "$file\n"; print SUMMARY "Field lengths for file $file\n"; print SUMMARY "$inputfiles{$file}\n"; if ( $file eq 'hiv_interactions' ) { # do special processing } } } ... }
    seems to be easier to read. It is obvious that the header is the same in both the summary and out files and that hiv_interactions gets special treatment. The headers are also easier to read. At least something to consider for your next script;)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://685622]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2024-04-25 17:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found