Re^5: how to read input from a file, one section at a time?

#!/usr/bin/perl
use strict;
use warnings;

my $report_name = 'aa_report.txt';
open my $out_file, '>', $report_name 
     or die "Cannot open '$report_name' because: $!";

print 'PLEASE ENTER THE FILENAME OF THE PROTEIN SEQUENCE: ';
chomp( my $prot_filename = <STDIN> );

open my $PROTFILE, '<', $prot_filename 
  or die "Cannot open '$prot_filename' because: $!";

$/ = ''; # Set paragraph mode

my @count=();
my $name;
while ( my $para = <$PROTFILE> ) {
    # Remove fasta header line
    if ( $para =~ s/^>(.*)//m ){
      $name = $1;
    };
    # Remove comment line(s)
    $para =~ s/^\s*#.*//mg;

    my %prot;
    $para =~ s/([A-Z])/ ++$prot{ $1 } /eg;
    
    my $num = scalar keys %prot;
    push @count,[$num,$name];
    printf "Counted %d for %s ..\n",$num,substr($name,0,50);
    
    print $out_file "$name\n";
    print $out_file join( ' ', map "$_=$prot{$_}", sort keys %prot ), 
+"\n";
    printf $out_file "Number of proteins = %d\n\n",$num ;
}

# sort names by count in ascending order to get lowest
my @sorted = sort { $a->[0] <=> $b->[0] } @count;
my $lowest = $sorted[0]->[0];

# maybe more than 1 lowest
printf $out_file "Least number of proteins is %d in these entries\n",$
+lowest;
my @lowest = grep { $_->[0] == $lowest } @sorted;
print $out_file "$_->[1]\n" for @lowest;

# show all results
print $out_file "\nAll results in ascending count\n";
for (@sorted){
  printf $out_file "%d  %s\n",@$_;
};
close $out_file;
print "Results in $report_name\n"
[download]

poj

Comment on Re^5: how to read input from a file, one section at a time? Download Code

Replies are listed 'Best First'.
Re^6: how to read input from a file, one section at a time? by davi54 (Sexton) on Feb 26, 2019 at 17:26 UTC
Thank you so much. This is exactly what I was looking for. I really appreciate your help.	[reply]
Re^6: how to read input from a file, one section at a time? by davi54 (Sexton) on Oct 15, 2019 at 19:10 UTC
In the above written script, how can I make the script to spit out the length of the sequence that is being read? So, after the line `printf $out_file "Number of proteins = %d\n\n",$num ;` in the above code, I tried - `printf $out_file "string length = length($num) ;` but nothing happens. What am I doing wrong?	[reply] [d/l] [select]
Re^7: how to read input from a file, one section at a time? by poj (Abbot) on Oct 21, 2019 at 10:35 UTC
You need to provide a value to printf for example `printf $out_file "string length = %d\n",length($num) ;` [download] but that gives you the length of the count value not the sequence. You need to calculate the sequence length before the value is changed by this counting regex `$para =~ s/([A-Z])/ ++$prot{ $1 } /eg;` Try making these changes `# Remove comment line(s) and white space $para =~ s/^\s#.//mg; $para =~ s/\s//g; # add my $seq_length = length($para); # add print "[$para]\n"; # optional . . printf $out_file "Number of proteins = %d\n",$num ; printf $out_file "String length = %d\n\n",$seq_length; # add` [download] poj	[reply] [d/l] [select]
Re^6: how to read input from a file, one section at a time? by davi54 (Sexton) on Mar 28, 2019 at 16:45 UTC
Hello Poj, In continuation to my previous question, I now want to count how many times a variable is absent. Ex: if in any given file, multiple entries don't have a W, I want the script to give me the output with the number of entries that don't have a W and similarly for other alphabets. So for instance, if 20 entries out of 100 in a file don't have a W, I want the output to be like W=20. How can that be done?	[reply]
Re^7: how to read input from a file, one section at a time? by poj (Abbot) on Mar 28, 2019 at 17:12 UTC
Declare a hash to hold the counts before the loop `my %absent=();` Count those missing inside the loop `for ('A'..'Z'){ ++$absent{$_} unless exists $prot{$_}; }` [download] print the results after the loop `# print absent counts for (sort keys %absent){ printf "%s=%d\n",$_,$absent{$_}; };` [download] Read more... (2 kB) poj	[reply] [d/l] [select]
Re^8: how to read input from a file, one section at a time? by davi54 (Sexton) on Apr 01, 2019 at 21:59 UTC
Hi Poj, Thanks again for your prompt help. I really appreciate it. The script works perfect. Although I have a small issue. Actually my input file has multiple duplicate entries. Is there any way to get rid of duplicate entries from the file before starting with the actual analysis that this script does? I was thinking if there is a way to compare the fasta headers before getting rid of them to check if there are duplicate entries? It can be a separate script (which can be run before this one) or can be a part of this script. Again, thank you so much for your help and time.	[reply]
Re^9: how to read input from a file, one section at a time? by AnomalousMonk (Archbishop) on Apr 01, 2019 at 23:31 UTC
Re^10: how to read input from a file, one section at a time? by davi54 (Sexton) on Apr 02, 2019 at 15:32 UTC
Some notes below your chosen depth have not been shown here
Re^8: how to read input from a file, one section at a time? by davi54 (Sexton) on Apr 02, 2019 at 21:30 UTC
Hi Poj, I'll try not to ask any further questions. Actually, after I ran the cleanup script, I tried to run the above count script (for counting existing and missing alphabets) that we discussed earlier. But now it has started giving me errors such as: Use of uninitialized value in printf at present_and_absent.pl line 58, <$PROTFILE> chunk 1426. Use of uninitialized value $name in printf at present_and_absent.pl line 33, <$PROTFILE> chunk 1178. Use of uninitialized value $name in concatenation (.) or string at present_and_absent.pl line 35, <$PROTFILE> chunk 1178. Can you please help me with this?	[reply]
Re^9: how to read input from a file, one section at a time? by davi54 (Sexton) on Apr 02, 2019 at 22:12 UTC


XP is just a number
	PerlMonks