bwelch has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks,
In a function that reads several million rows from a database, I'm finding it is taking a very long time to complete. I recently added use warnings; to the script and am now seeing this message many times in the log:

"Use of uninitialized value in concatenation (.) or string at line xxx"

I'm not seeing what is not initialized here. Do you?

Since this is taking a long time to complete, could you recommend any optimizations? thanks, Bryan

use strict; use warnings; # Added this line in last run sub build4PartFastaFile { &addDateStamp; my ( $fileToMake, $dbCommand ) = @_; open( FILE, ">$fileToMake" ) or die print "Can't open fileToMake: $! +"; my $sth2 = $dbh->prepare( $dbCommand ) or die print LOG "Can't prepa +re: $! OR $DBI::errstr"; $sth2->execute or die print LOG "Can't execute: $! OR $DBI::errstr"; while ( ( my $giName, my $gssName, my $definition, my $sequence ) = +$sth2->fetchrow_array ) { $sequence = uc( $sequence ); $sequence =~ s/(\S{1,80})/$1\n/g; print FILE ">$giName $gssName $definition\n$sequence"; } close( FILE ); }

Replies are listed 'Best First'.
Re: Finding error: uninitialized value in concatenation?
by Fletch (Bishop) on Mar 02, 2005 at 15:20 UTC

    Just a guess, but if you're receiving any NULL values back from your DB then those will be translated to undef in perl.

Re: Finding error: uninitialized value in concatenation?
by gellyfish (Monsignor) on Mar 02, 2005 at 15:23 UTC

    It is probable that one of the values of $giName, $gssName, $definition or $sequence are not defined (NULL) in the line

    print FILE ">$giName $gssName $definition\n$sequence";
    I guess that the data contains some NULL columns. Also you might be better off writing the:
    ( my $giName, my $gssName, my $definition, my $sequence )
    my ($giName,$gssName,$definition, $sequence )


Re: Finding error: uninitialized value in concatenation?
by Roy Johnson (Monsignor) on Mar 02, 2005 at 15:27 UTC
    Just to be clear: is the line it refers to the print within the while in your example code? If one of the fetched columns is NULL, the variable it is assigned to will be undef.

    You can group your my declarations, and you can use map to get rid of undefs:

    while ( my( $giName, $gssName, $definition, $sequence ) = map {defined + ? $_ : ''} $sth2->fetchrow_array )
    Or you can turn off uninitialized warnings within the while loop:
    no warnings 'uninitialized';

    Caution: Contents may have been coded under pressure.
Re: Finding error: uninitialized value in concatenation?
by tall_man (Parson) on Mar 02, 2005 at 15:48 UTC
    I believe Fletch is right about the NULLs. The "concatentation" in your error message comes from interpolating the results into the string in this line:
    print FILE ">$giName $gssName $definition\n$sequence";
    One easy way to turn undef's into reasonable values is this idiom:
    $giName ||= " ";
    I don't see a lot to optimize, but here are my suggestions:
    # makes array interpolate with newlines local $" = "\n"; # Pull my variable declarations outside the inner loop. my ($giName, $gssName, $definition, $sequence, @seq); while (($giName, $gssName, $definition, $sequence) = $sth2->fetchro +w_array ) { $sequence = uc( $sequence ); # If I'm right about your data, you just want to # turn the whitespace into newlines. # The following should be faster. @seq = split /\s+/,$sequence; # For any fields that can be null: $giName ||= " "; # Less interpolation will be faster, but we need it # for the array. print FILE ">",$giName," ",$gssName," ",$definition,"\n","@seq", +"\n"; }
      Thanks much. The problem was with an occasionally NULL value.

      To clean up the sequence data and make it suitable for other tools, I needed to convert it to upper case and break up the sequence and associated info into lines of 80 characters or less.

      I've made several of the optimizations and will try another run soon.

        If you need to reformat blocks of text with wrapping, I suggest you look at Text:Wrap. The regular expression you have now will over-split short lines:
        use strict; my $sequence = "This is a nice line "; # Note: You should use '\1' rather than '$1'. $sequence =~ s/(\S{1,80})/\1\n/g; print "*",$sequence,"*\n";
        This prints:
        *This is a nice line *
        If $sequence contains nothing but solid blocks of non-space characters (genome sequences, for example), or you don't care about splitting short words, then unpack would be faster.
        use strict; my $sequence = "123456789012345678901234567890123456789012345678901234 +56789012345678901234567890123456"; my $len = int(length($sequence)/80); my @seq = ($len > 0) ? unpack("(A80)$len A*",$sequence) : $sequence; print "*",join("\n",@seq),"*\n";