Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

reading files in directory

by anonym (Acolyte)
on Mar 19, 2012 at 23:41 UTC ( #960496=perlquestion: print w/replies, xml ) Need Help??

anonym has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I have a multifasta file with me and i have used awk command in my perl script to split that into separate fasta files in the working directory.However , I have a question as to reading all those files and storing each sequence in a hash.

my $genome_file = @ARGV[0]; open(MULTIFASTA,"<",$genome_file) or die "Can't open $genome_file!\n"; my $header; my $sequence; $/ = "\n"; my %hash; my @fasta = `awk '/^>/{f=++d".fasta"} {print > f}' $genome_file`;

Then i want to read each individual fasta file generated in my directory and store the sequence in a hash.I really appreciate any help.Thanks

Replies are listed 'Best First'.
Re: reading files in directory
by GrandFather (Sage) on Mar 20, 2012 at 00:09 UTC

    Why split the file with awk if you are then going to process the data in memory anyway? Read and process the original file, perhaps writing the parts to new files along the way if they are needed for some other purpose.

    See readdir for the Perl tool used to scan directory contents.

    There's not much help we can give in reply to "I want to store data in a hash" because it simply doesn't tell us enough about why you want to do that nor what you want to store, nor how you want to access it.

    True laziness is hard work
Re: reading files in directory
by Riales (Hermit) on Mar 19, 2012 at 23:51 UTC

    Which part are you have trouble with, exactly? Opening the fasta files for reading? Parsing the files? Using a hash?

    What does a 'fasta' file look like? You say you want to store a sequence in a hash. What would you use as a key?

      The file format is

      >chr1 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGCNNNNNNNNNNNNNNNNGCAT... +..... >chr2 ATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN +NNNN. .... EOF

      I have splitted this multifasta into 25 pieces which is present in my current working directory.However, I am confused as to how can I read all these fasta files and store the hash key as header line and value as sequence. Thanks

        Well, the first thing you'll need to do is tell Perl what the relevant files are. The simplest way to do that is to pass it in as a command-line argument. Look for @ARGV in perldoc perlvar for that.

        If the files are all in the same directory and there are no other files in the directory (and you don't want to type out 25 separate filenames), you could use a loop on STDIN to receive piped into into your script:

        my @filenames; push @filenames, $_ while (<STDIN>); chomp@filenames; use Data::Dumper; print Dumper(@filenames);

        That way, you can get all your filenames into your script by simply doing:

        ls | my_script.pl

        As for actually opening each file for reading, you've already done that. What's confusing about reading the files? Have you checked perldoc -f open?

        As for hashes, they're pretty simple:

        my %hash; # Declare your hash. $hash{$key} = $sequence; # Assign $sequence to the hash with key $key. my $sequence_im_looking_for = $hash{$key}; # Get the sequence by looki +ng it up with its key $key.

        It looks like you'll be alternating lines; line1 is the key for the sequence in line2, line3 is the key for the sequence in line4, etc.

Re: reading files in directory
by Sinistral (Monsignor) on Mar 20, 2012 at 13:44 UTC

    Don't reinvent the wheel.

    Perl seems to be the language of choice in the biological sciences, especially genetics. I don't know if you've seen it, but make sure you've checked out the many well thought out solutions available at the BioPerl site. In particular, I found a reference to the FASTA format.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://960496]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2020-07-08 01:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?