Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Finding error: uninitialized value in concatenation?

by tall_man (Parson)
on Mar 02, 2005 at 15:48 UTC ( [id://435885] : note . print w/replies, xml ) Need Help??


in reply to Finding error: uninitialized value in concatenation?

I believe Fletch is right about the NULLs. The "concatentation" in your error message comes from interpolating the results into the string in this line:
print FILE ">$giName $gssName $definition\n$sequence";
One easy way to turn undef's into reasonable values is this idiom:
$giName ||= " ";
I don't see a lot to optimize, but here are my suggestions:
# makes array interpolate with newlines local $" = "\n"; # Pull my variable declarations outside the inner loop. my ($giName, $gssName, $definition, $sequence, @seq); while (($giName, $gssName, $definition, $sequence) = $sth2->fetchro +w_array ) { $sequence = uc( $sequence ); # If I'm right about your data, you just want to # turn the whitespace into newlines. # The following should be faster. @seq = split /\s+/,$sequence; # For any fields that can be null: $giName ||= " "; # Less interpolation will be faster, but we need it # for the array. print FILE ">",$giName," ",$gssName," ",$definition,"\n","@seq", +"\n"; }

Replies are listed 'Best First'.
Re^2: Finding error: uninitialized value in concatenation?
by bwelch (Curate) on Mar 02, 2005 at 18:59 UTC
    Thanks much. The problem was with an occasionally NULL value.

    To clean up the sequence data and make it suitable for other tools, I needed to convert it to upper case and break up the sequence and associated info into lines of 80 characters or less.

    I've made several of the optimizations and will try another run soon.

      If you need to reformat blocks of text with wrapping, I suggest you look at Text:Wrap. The regular expression you have now will over-split short lines:
      use strict; my $sequence = "This is a nice line "; # Note: You should use '\1' rather than '$1'. $sequence =~ s/(\S{1,80})/\1\n/g; print "*",$sequence,"*\n";
      This prints:
      *This is a nice line *
      If $sequence contains nothing but solid blocks of non-space characters (genome sequences, for example), or you don't care about splitting short words, then unpack would be faster.
      use strict; my $sequence = "123456789012345678901234567890123456789012345678901234 +56789012345678901234567890123456"; my $len = int(length($sequence)/80); my @seq = ($len > 0) ? unpack("(A80)$len A*",$sequence) : $sequence; print "*",join("\n",@seq),"*\n";