Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Print a previous to previous of a matching line

by ag88 (Novice)
on Oct 08, 2013 at 09:11 UTC ( [id://1057387]=perlquestion: print w/replies, xml ) Need Help??

ag88 has asked for the wisdom of the Perl Monks concerning the following question:

Hello everyone. I am new to programing and new to PERL as well of course. I needed to write a script to extract some information from a large sized file. My file looks like

# BLASTP 2.2.28+ # Query: gi|338220664|gb|EGP06123.1| hypothetical protein GEW_00005 [P +asteurella multocida subsp. gallicida str. Anand1_poultry] # Database: nr-25sep # Fields: query id, subject id, % identity, alignment length, mismatch +es, gap opens, q. start, q. end, s. start, s. end, evalue, bit score # 2 hits found gi|338220664|gb|EGP06123.1| gi|45383702|ref|NP_989542.1| 45.15 + 206 96 7 3 204 28 220 1e-51 170 gi|338220664|gb|EGP06123.1| gi|15419940|gb|AAK97214.1| 44.17 +206 98 7 3 204 28 220 5e-50 166 # BLASTP 2.2.28+ # Query: gi|338220666|gb|EGP06125.1| hypothetical protein GEW_00015 [P +asteurella multocida subsp. gallicida str. Anand1_poultry] # Database: nr-25sep # 0 hits found # BLASTP 2.2.28+ # Query: gi|338220651|gb|EGP06111.1| hypothetical protein GEW_00275 [P +asteurella multocida subsp. gallicida str. Anand1_poultry] # Database: nr-25sep # 0 hits found

I basically want to extract the "query line" particularly the number after "gi" in the query line only of those that have 0 hits. So in this case my matching line would be "# 0 hits found". I have wrote a small script which extract the matching line but i am unable to extract the query line and the number after gi in the query line. My code is

sub getGI { open(FILE, "twentySeq1e-10.out") or die("Cannot open file"); while(<FILE>) { my $line = $_; if($line=~/# 0 hits found/) { print "$line\n"; } } }

The desired output which I want is the number after "gi" in the query line, only of those having 0 hits. For example in this case the output would be

338220666 338220651

The "query" line is 2line before the matching line. If some one could help me with this I would be grateful. Thanks

Replies are listed 'Best First'.
Re: Print a previous to previous of a matching line
by kcott (Archbishop) on Oct 08, 2013 at 11:35 UTC

    G'day ag88,

    Welcome to the monastery.

    You can treat each BLASTP block as a single record. This makes it easy to identify which have "0 hits found", and print their "gi" values. (In the code below, I've truncated the data lines to 60 characters.)

    #!/usr/bin/env perl -l use strict; use warnings; { local $/ = "# BLASTP 2.2.28+\n"; while (<DATA>) { print /gi\|(\d+)/ if /0 hits found/; } } __DATA__ # BLASTP 2.2.28+ # Query: gi|338220664|gb|EGP06123.1| hypothetical protein GE # Database: nr-25sep # Fields: query id, subject id, % identity, alignment length # 2 hits found gi|338220664|gb|EGP06123.1| gi|45383702|ref|NP_989542.1| gi|338220664|gb|EGP06123.1| gi|15419940|gb|AAK97214.1| 44.1 # BLASTP 2.2.28+ # Query: gi|338220666|gb|EGP06125.1| hypothetical protein GE # Database: nr-25sep # 0 hits found # BLASTP 2.2.28+ # Query: gi|338220651|gb|EGP06111.1| hypothetical protein GE # Database: nr-25sep # 0 hits found

    Output:

    338220666 338220651

    See "perlvar: Variables related to filehandles" for a discussion of this usage of "$/" (the input record separator).

    -- Ken

Re: Print a previous to previous of a matching line
by jethro (Monsignor) on Oct 08, 2013 at 09:40 UTC
    sub getGI { my $previous1,$previous2; open(FILE, "twentySeq1e-10.out") or die("Cannot open file"); while(<FILE>) { my $line = $_; if($line=~/# 0 hits found/) { print "$previous2\n"; } $previous2= $previous1; $previous1= $line; }

    The generalized solution would use an array. You use unshift() to add the line at the start of the array and you use pop() to remove the last line if the array has length n+1 (with n being the number of lines you want to remember). That is called a pipeline, queue, shift register or FIFO (first-in-first-out).

      Thankyou soo much for help it worked. Thanks alot :)

Re: Print a previous to previous of a matching line
by McA (Priest) on Oct 08, 2013 at 09:46 UTC

    Hi,

    in this case I would take the following approach:

    #!/usr/bin/env perl use strict; use warnings; use 5.010; # read first line assuming it is a kind of block seperator my $bsep = <>; $/ = $bsep; while(defined(my $block = <>)) { chomp $block; my @records = split /\n/, $block; next if @records < 3; # malformed block foreach my $record (@records) { say $record; } say "========================"; }

    Now you can find and operate on every block how you like.

    Best regards
    McA

Re: Print a previous to previous of a matching line
by hippo (Bishop) on Oct 08, 2013 at 09:40 UTC
Re: Print a previous to previous of a matching line
by Anonymous Monk on Oct 08, 2013 at 09:14 UTC
    Put  my $previous_line; at the top, and then assign to  $previous_line some place that it makes sense, and then when you want to do the printing, I forget

      I want to get a line previous to a previous line. In short words 2nd previous line to a matching line.

        yes, the answer is the same
Re: Print a previous to previous of a matching line
by Generoso (Prior) on Oct 08, 2013 at 19:06 UTC

    Try this it works for me.

    #!/usr/bin/perl -w use strict; use warnings; #open(FILE, "twentySeq1e-10.out") or die("Cannot open file"); my $gi; while(<DATA>){ if(/^# Query: gi.([0-9]+)/) {$gi = $1;} if(/^# 0 hits found/){print "$gi\n";} } __DATA__ # BLASTP 2.2.28+ # Query: gi|338220664|gb|EGP06123.1| hypothetical protein GEW_00005 [P +asteurella multocida subsp. gallicida str. Anand1_poultry] # Database: nr-25sep # Fields: query id, subject id, % identity, alignment length, mismatch +es, gap opens, q. start, q. end, s. start, s. end, evalue, bit score # 2 hits found gi|338220664|gb|EGP06123.1| gi|45383702|ref|NP_989542.1| 45.15 + 206 96 7 3 204 28 220 1e-51 170 gi|338220664|gb|EGP06123.1| gi|15419940|gb|AAK97214.1| 44.17 +206 98 7 3 204 28 220 5e-50 166 # BLASTP 2.2.28+ # Query: gi|338220666|gb|EGP06125.1| hypothetical protein GEW_00015 [P +asteurella multocida subsp. gallicida str. Anand1_poultry] # Database: nr-25sep # 0 hits found # BLASTP 2.2.28+ # Query: gi|338220651|gb|EGP06111.1| hypothetical protein GEW_00275 [P +asteurella multocida subsp. gallicida str. Anand1_poultry] # Database: nr-25sep # 0 hits found

    RESULT

    Process started >>> 338220666 338220651 <<< Process finished. (Exit code 0)
Re: Print a previous to previous of a matching line
by pemungkah (Priest) on Oct 08, 2013 at 23:53 UTC
    Anonymous Monk's suggestion is the right one, just a bit elliptic. Let's phrase this another way:
    • You're going through the file a line at a time.
    • If you see a line you might want (a "Query" line), you save it in a variable and keep reading lines.
    • If you see a "hits" line that matches your criterion, the line you saved was one you want. Print it or stick it in an array for later, or...
    • If you see a "hits" line and it doesn't match your criterion, then you don't want the "Query" you saw previously. Throw it away by setting the variable to "".
    I didn't write out the code because I think it might be more useful to you to write the code yourself. You shouldn't need anything more complicated than one variable to keep the query line in, another one to read the next line from <STDIN> into, a while() loop to keep reading until you're out of lines, and an a couple of if() statements (is this line a "query" line, is this line a "hits" line, does this "hits" line meet my "I want the last 'Query' line" criterion) inside the loop. You don't even need a trailing check outside the loop, because a "hits" line always follows a "Query" line.

      Thankyou all for the suggestions. It was really helpful. The following code did my task

      sub getGiForZeroHits { my $previous1,$previous2; open(FILEOUT,">giForZeroHits.txt") or die("Cannot open file"); { open(FILE, "$inputSeqFileForBlast-1e-10.out") or die("Cannot open file +"); { while(<FILE>) { my $line = $_; if($line=~/# 0 hits found/) { my @lineSpl = split(/\|/, $previous2); print FILEOUT "$lineSpl[1]\n"; } #close if $previous2= $previous1; $previous1= $line; } #close while close(FILE); } #close FILE close(FILEOUT) } #close FILEOUT } #close sub
Re: Print a previous to previous of a matching line
by BillKSmith (Monsignor) on Oct 08, 2013 at 20:10 UTC
    You may be asking the wrong question. It appears that you want to consider two fields from each "logical record". If you could parse your data file first into records and then into fields, your strange requirement would disapear.
    Bill

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1057387]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2024-03-29 11:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found