Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: how do i obtain blast result from the given file

by rjt (Curate)
on Jun 17, 2013 at 19:15 UTC ( [id://1039449]=note: print w/replies, xml ) Need Help??


in reply to how do i obtain blast result from the given file

You can print the matching lines with a regular expression that matches only the lines you're interested in:

while (<$fh>) { print if /^(.+?)\|(.+?)\| (.+?)\s+(\d+)\s+(\d+e[+-]\d+)$/; }

The above will also handle arbitrarily large files, as it reads one line at a time into memory.

There are many hits, but if I want the top ten only, what do i do?

I'm not sure what you mean by top ten, exactly (highest "Score"? "E Value"? Order within the file?), but a likely approach to this would be to modify the above while loop and instead of printing, place each line in a hash. My regular expression already does basic parsing of the input line into capture variables $1 through $5, so you can try something like this:

$hash{$2} = { col1 => $1, desc => $3, score => $4, E => $5 } if /.../; # Use regex above

Then, after the loop finishes, you can sort and display the results however you like. For example:

my $how_many = 10; for (sort { $hash{$b}->{score} <=> $hash{$a}->{score} } keys %hash +) { printf "%-4s %-55s %5s\n", $hash{$_}->{col1}, $hash{$_}->{desc}, $hash{$_}->{score}; last if --$how_many == 0; }

Outputs:

emb plasma membrane H+-ATPase [Oryza sativa Japonica... 213 gb hypothetical protein OsI_09609 [Oryza sativa Indi... 213 ref Os03g0100800 [Oryza sativa Japonica Group] >... 213 gb ATPase 11, plasma membrane-type [Aegilops tauschii] 208 ref PREDICTED: ATPase 7, plasma membrane-type is... 207 ref H(\+)-transporting atpase plant/fungi plasma... 207 ref PREDICTED: plasma membrane ATPase 1-like [Br... 207 ref PREDICTED: ATPase 7, plasma membrane-type is... 207 ref autoinhibited H+ ATPase [Populus trichocarpa... 206 ref PREDICTED: plasma membrane ATPase 1-like iso... 205

Replies are listed 'Best First'.
Re^2: how do i obtain blast result from the given file
by bingalee (Acolyte) on Jun 17, 2013 at 19:59 UTC

    correction- I'm so sorry, I meant the first ten hits

      In that case, just modify the initial loop to stop after ten hits:

      my $how_many = 10; while (<$fh>) { if (/^(.+?)\|(.+?)\| (.+?)\s+(\d+)\s+(\d+e[+-]\d+)$/) { print; last if --$how_many == 0; } }

      If that's all you want to do, you don't need the hash. Note that you can still format the output with printf: printf "%-4s %-55s %5s\n", $1, $3, $4;

      You could still use the hash, as well, if you need to do further processing on the results. In that case, to preserve the order, either add an index to the hash ref for use with sort, or, simpler, push each key to an array as you find them:

      my @hits; # Keys in order my $how_many = 10; while (<$fh>) { if (/^(.+?)\|(.+?)\| (.+?)\s+(\d+)\s+(\d+e[+-]\d+)$/) { $hash{$2} = { col1 => $1, desc => $3, score => $4, E => $5, key => $2 }; push @hits, $2; last if --$how_many == 0; } } # Go through the first ten hits, in order for (map { $hash{$_} } @hits) { # $_ contains the hash ref for each record printf "%-5s %-55s %5s\n", $_->{col1}, $_->{desc}, $_->{score}; }

        thank you :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1039449]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2024-04-19 00:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found