Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^6: how to read input from a file, one section at a time?

by davi54 (Sexton)
on Mar 28, 2019 at 16:45 UTC ( #1231819=note: print w/replies, xml ) Need Help??


in reply to Re^5: how to read input from a file, one section at a time?
in thread how to read input from a file, one section at a time?

Hello Poj, In continuation to my previous question, I now want to count how many times a variable is absent. Ex: if in any given file, multiple entries don't have a W, I want the script to give me the output with the number of entries that don't have a W and similarly for other alphabets. So for instance, if 20 entries out of 100 in a file don't have a W, I want the output to be like W=20. How can that be done?
  • Comment on Re^6: how to read input from a file, one section at a time?

Replies are listed 'Best First'.
Re^7: how to read input from a file, one section at a time?
by poj (Abbot) on Mar 28, 2019 at 17:12 UTC

    Declare a hash to hold the counts before the loop

    my %absent=();

    Count those missing inside the loop

    for ('A'..'Z'){ ++$absent{$_} unless exists $prot{$_}; }

    print the results after the loop

    # print absent counts for (sort keys %absent){ printf "%s=%d\n",$_,$absent{$_}; };
    poj
      Hi Poj,

      Thanks again for your prompt help. I really appreciate it. The script works perfect. Although I have a small issue. Actually my input file has multiple duplicate entries. Is there any way to get rid of duplicate entries from the file before starting with the actual analysis that this script does? I was thinking if there is a way to compare the fasta headers before getting rid of them to check if there are duplicate entries? It can be a separate script (which can be run before this one) or can be a part of this script.

      Again, thank you so much for your help and time.

        From poj's code:

        my $name; while ( my $para = <$PROTFILE> ) { # Remove fasta header line if ( $para =~ s/^>(.*)//m ){ $name = $1; }; ... }
        A quick and dirty and UNTESTED modification to do what I think you want:
        my $name; my %name_seen; # fasta headers seen so far FASTA_RECORD: while ( my $para = <$PROTFILE> ) { # Remove fasta header line if ( $para =~ s/^>(.*)//m ){ $name = $1; next FASTA_RECORD if $name_seen{ $name }++; }; ... }
        Warning: The requirement to "... get rid of duplicate entries ..." is ambiguous. If there is more than one entry with the same header (i.e., $name), which is (or are, if there are more than two) the duplicate(s)? The first one? The last one? Etc. The code modification above ignores all entries with a given $name after the first one. Also, it might be wise to trim all leading/trailing whitespace from $name before any further processing whatsoever (also untested):
            $name = $1;
            $name =~ s{ \A \s+ | \s+ \z }{}xmsg;


        Give a man a fish:  <%-{-{-{-<

      Hi Poj,
      I'll try not to ask any further questions. Actually, after I ran the cleanup script, I tried to run the above count script (for counting existing and missing alphabets) that we discussed earlier. But now it has started giving me errors such as:
      Use of uninitialized value in printf at present_and_absent.pl line 58, <$PROTFILE> chunk 1426.
      Use of uninitialized value $name in printf at present_and_absent.pl line 33, <$PROTFILE> chunk 1178.
      Use of uninitialized value $name in concatenation (.) or string at present_and_absent.pl line 35, <$PROTFILE> chunk 1178.
      Can you please help me with this?
        Nevermind, I figured it out. There is no ">" in the cleaned headers which is why the script is getting errors.
        Thanks again!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1231819]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2020-10-25 06:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (249 votes). Check out past polls.

    Notices?