http://qs321.pair.com?node_id=11104900

yueli711 has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I have file here. I want to change file 1.fa to file 1.txt. Thanks in advance for great help! Best, yue

open(IN1, "1.fa") || die "Cannot open this file"; @lines = <IN1>; $i = 0; for (@lines) { $lines[$i]=~s/\nA/\tA//g; $lines[$i]=~s/\nT/\tT//g; $lines[$i]=~s/\nC/\tC//g; $lines[$i]=~s/\nG/\tG//g; $lines[$i]=~s/\nN/\tN//g; $thislines[$i]=$lines[$i]; print $thislines[$i]; $i++; } open(OUT, ">1.txt") || die "Cannot open this file"; for $thisline(@thislines){ print OUT $thisline;} close(OUT); close(IN1);
1.fa >1 AGTCGTAGCAT >2 TGAGCTACG >3 GGCATAGN >4 CGCACNCAGCTACACC >5 NGATAGCTACA
1.txt >1 AGTCGTAGCAT >2 TGAGCTACG >3 GGCATAGN >4 CGCACNCAGCTACACC >5 NGATAGCTACA

Replies are listed 'Best First'.
Re: change \n to \t
by AnomalousMonk (Archbishop) on Aug 23, 2019 at 16:41 UTC

    Here's a command-line version (see perlrun) (update: this code processes line-by-line, so it should work on any file — unless a "line" is more than a couple GB long! :). Caution: This only works with Perl versions 5.10+ because it uses the  \K regex operator (see Extended Patterns). If you have a pre-5.10 Perl version, let me know; a simple fix can be had (update: what the heck; the pre-5.10 substitution is  s{ ^ (> \d+) \n }{$1\t}xms with all else the same).

    c:\@Work\Perl\monks\yueli711>type 1.fa >1 AGTCGTAGCAT >2 TGAGCTACG >3 GGCATAGN >4 CGCACNCAGCTACACC >5 NGATAGCTACA c:\@Work\Perl\monks\yueli711>perl -pe "s{ ^ > \d+ \K \n }{\t}xms" 1.f +a > 1.txt c:\@Work\Perl\monks\yueli711>type 1.txt >1 AGTCGTAGCAT >2 TGAGCTACG >3 GGCATAGN >4 CGCACNCAGCTACACC >5 NGATAGCTACA
    This runs under Windows. I can't test this, but I think if you're running under *nix, just replace all the  " (double-quotes) in the command-line with  ' (single-quotes).


    Give a man a fish:  <%-{-{-{-<

Re: change \n to \t
by hippo (Bishop) on Aug 23, 2019 at 15:28 UTC

    TIMTOWTDI. I'd abandon the array unless you need it for something else.

    #!/usr/bin/env perl use strict; use warnings; my $text; { local $/ = undef; $text = <DATA>; } $text =~ s/\n(?!>)/\t/g; print "$text\n"; __DATA__ >1 AGTCGTAGCAT >2 TGAGCTACG >3 GGCATAGN >4 CGCACNCAGCTACACC >5 NGATAGCTACA

    This approach uses a negative look-ahead. It replaces a newline not followed by an angle bracket with a tab. HTH.

      Hello, hippo, Thank you so much for your response! Thank you for your help! I really appreciated! Best, Yue

        Actually, my DATA file is huge.
        Then you're probably doing the wrong way using two arrays to store the contents of the file. It is better to read, process and output one line at a time.

        For example something like this:

        open my $IN1, "<", "1.fa" or die "Cannot open input file"; open my $OUT, ">", "1.txt" or die "Cannot open output file"; while (my $line = <$IN1>) { # Do here whatever transformation/substitution you need to $line print $OUT $line; } close $IN1; close $OUT;
        This will be faster and will consume much less memory.
Re: change \n to \t
by haukex (Archbishop) on Aug 24, 2019 at 09:12 UTC

    Note that none of the solutions so far take into account what your original code seems to want to do, which is to only replace newlines before [ATCGN]; OTOH your solution doesn't take into account whether the lines start with > or not. Here is one way that does both, while processing the file line-by-line, thereby saving memory.

    use warnings; use strict; my $prevline; while ( my $curline = <DATA> ) { next unless defined $prevline; if ( $prevline=~/^>/ && $curline=~/^[ATCGN]/ ) { $prevline =~ s/\n\z/\t/; } print $prevline; } continue { $prevline = $curline } print $prevline if defined $prevline; __DATA__ >1 AGTCGTAGCAT foo bar >2 TGAGCTACG >3 GGCATAGN quz >4 CGCACNCAGCTACACC >5 NGATAGCTACA

    Output:

    >1 AGTCGTAGCAT foo bar >2 TGAGCTACG >3 GGCATAGN quz >4 CGCACNCAGCTACACC >5 NGATAGCTACA
Re: change \n to \t
by jwkrahn (Abbot) on Aug 23, 2019 at 18:35 UTC
    $ echo "1.fa >1 AGTCGTAGCAT >2 TGAGCTACG >3 GGCATAGN >4 CGCACNCAGCTACACC >5 NGATAGCTACA " | perl -pe'$_ .= "\t" if s/^>\S+\K\s+$//' 1.fa >1 AGTCGTAGCAT >2 TGAGCTACG >3 GGCATAGN >4 CGCACNCAGCTACACC >5 NGATAGCTACA
Re: change \n to \t
by Marshall (Canon) on Aug 24, 2019 at 23:03 UTC
    Works with huge 1.fa file.

    #!/usr/bin/perl use strict; use warnings; while (my $line =<DATA>) { $line =~ s/^\s*|\s*$//g; if ($line =~ /^\d/) { print "$line\t"; } else { print "$line\n"; } } =Prints 1 AGTCGTAGCAT 2 TGAGCTACG 3 GGCATAGN 4 CGCACNCAGCTACACC 5 NGATAGCTACA =cut __DATA__ 1 AGTCGTAGCAT 2 TGAGCTACG 3 GGCATAGN 4 CGCACNCAGCTACACC 5 NGATAGCTACA
    To prevent blank lines in output caused by blank lines in 1.fa, add next unless $line =~ /\S/; or similar after removing leading and trailing spaces.

    Update: looking back this, if the ">" actually appears in the 1.fa file, then just delete it if seen while processing the line and use the above logic.