http://qs321.pair.com?node_id=736690


in reply to Conditionally Substituting multi-line string with single string

I'd approach it differently. For each line, I would check if the line consisted of all N's. If so, set a flag to indicate this, print nothing, and move to the next line

Repeat until you find a line that is NOT all N's. At that point, print your ">scaffold00002.$something" line, and the process the current line as appropriate.

  • Comment on Re: Conditionally Substituting multi-line string with single string
  • Download Code

Replies are listed 'Best First'.
Re^2: Conditionally Substituting multi-line string with single string
by gone2015 (Deacon) on Jan 15, 2009 at 23:27 UTC

    Which might be something along the lines of:

    my $flag = 0 ; my $scaffold ; my $enumerator ; while (my $fastaline=<FILE>) { if ($fastaline=~ m/^>(\S*)/) { $scaffold = $1 ; $enumerator = 1 ; $flag = 1 ; } elsif ($fastaline =~ m/[ACGT]/) { if ($flag) { print WORKFILE ">$scaffold.$enumerator\n" ; $enumerator++ ; $flag = 0 ; } ; print WORKFILE $fastaline ; } else { $flag = 1 ; } ; } ;
    NB: this is not checking that the input is well formed: (a) it accepts any line that contains at least one [ACGT] as being a line to keep; (b) it does not check that the lines being dropped are all N; (c) it does not check the exact form of >scaffold lines; (d) it does not check that at least one ACGT line follows each >scaffold line; ... If the input is 100% trusted, that's fine... (if 100% trustworthy input isn't an oxymoron).