Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Question: homemade blat: insert query from keyboard to find an oligo in an hashtable

by Cristoforo (Curate)
on Dec 05, 2015 at 02:37 UTC ( [id://1149441]=note: print w/replies, xml ) Need Help??


in reply to Question: homemade blat: insert query from keyboard to find an oligo in an hashtable

Hello nikkk. To answer your question, the problem is this line: $query = <>;. The diamond operator, <> opens a file supplied in the command line (contained in @ARGV, so $query now contains the first line from that file. There are 2 ways I can think of to avoid this.

First, you can change that line to $query = <STDIN>; and the program will read what you type in instead of doing the magic of reading from the file contained in @ARGV.

Second, you could empty the @ARGV array at the top of your program by saying $nomefile = shift @ARGV;. shift would remove the first (and in your case the only) filename from the command line and that would solve the problem when using the empty diamond <> operator later in the program.

Even though the second method would solve your problem, it is probably still better to use <STDIN> when reading a response from the command line anyway. That way, you will avoid any problems in your program.

I tried to understand what your program does and attempted to solve it using the Bio::SeqIO module which provides a good way to deal with fasta files. Documentation for BioPerl is here.

My example program and sample fasta data file are below.

Update: Changed program to process data as read (instead of storing to a data structure to process afterwards).
#usr/bin/perl use strict; use warnings; use Bio::SeqIO; # 'shift' removes items from @ARGV my $nomefile = shift; print "Insert query:\n"; chomp(my $query = uc <STDIN>); my $test = Bio::SeqIO->new( -file => $nomefile, -format => 'fasta'); print "\nResults for $nomefile\n\n"; while (my $seq = $test->next_seq()) { my @pos; my $id = $seq->id; my $fasta = uc $seq->seq; while ($fasta =~ /(?=$query)/g) { push @pos, $-[0] + 1; } if (@pos) { printf "%s compare %d volte in posizione @pos with ID %s\n", $query, scalar @pos, $id; } else { print "$query does not appear in the sequence with ID $id\n"; + } }
The data file I tested with was:
>chr1 AACCCCCCCCTCCCCCCGCTTCTGGCCACAGCACTTAAACACATCTCTGC CAAACCCCAAAAACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAAT TTTATCTTTAGGCGGTATGCACTTTTAACAAAAAANNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN GCCCATCCTACCCAGCACACACACACCGCTGCTAACCCCATACCCCGAAC CAACCAAACCCCAAAGACACCCCCCACAGTTTATGTAGCTTACCTCNNNN >chrM GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCAT TTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTG GAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATT CTATTATTTATCGCACCTACGTTCAATATTACAGGCGAACATACCTACTA AAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATAACAATTGAAT GTCTGCACAGCCGCTTTCCACACAGACATCATAACAAAANAATTTCCACC
Output I got from this sample data in response to the query was:
Insert query: tatt TATT does not appear in the sequence with ID chr1 TATT compare 5 volte in posizione 20 55 152 155 177 with ID chrM

I hope this helps answer your question. If you don't understand some of the code I wrote, ask and I'll try to explain. There are probably some new things there that you may not have seen yet.

  • Comment on Re: Question: homemade blat: insert query from keyboard to find an oligo in an hashtable
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: Question: homemade blat: insert query from keyboard to find an oligo in an hashtable
by nikkk (Novice) on Dec 05, 2015 at 10:46 UTC
    Thank you very much for your kind answer, it was very simple, i'm new in using Perl but i'm enjoying it very much!

    Can i ask a little enlightment please?

    It's possible to search for a query not the length used to create the index (for example a query >n or a multiple of n) without recreating the whole hashtable?

    And it seems that the position of the oligos found in the sequence are shifted by one, i know that we count from zero but i can't figure out how to change it...

    Thanks!
      It's possible to search for a query not the length used to create the index (for example a query >n or a multiple of n) without recreating the whole hashtable?
      This program does it.

      Update: Changed program so it finds matches in while loop - no need to store data. Process it as you read it.

      #!/usr/bin/perl use strict; use warnings; # 'shift' removes items from @ARGV my $nomefile = shift; open my $fh, '<', $nomefile or die "Unable to open $nomefile. $!"; print "Insert query:\n"; chomp(my $query = uc <STDIN>); my $sequenza; chomp(my $id = <$fh>); # reads first line ('should' be ID) print "\nResults for $nomefile\n\n"; while (my $line = <$fh>) { chomp $line; if (substr($line, 0, 1) eq '>') { find_matches($id, $sequenza, $query); $sequenza = ''; $id = $line; } else { $sequenza .= uc $line; } find_matches($id, $sequenza, $query) if eof; } close $fh or die "Unable to close $nomefile. $!"; sub find_matches { my ($id, $sequenza, $query) = @_; my @pos; while ($sequenza =~ /(?=$query)/g) { push @pos, $-[0] + 1; } if (@pos) { printf "%s compare %d volte in posizione @pos with ID %s\n", $query, scalar @pos, $id; } else { print "$query does not appear in the sequence with ID $id\n"; } }
      For 3 runs of the program (using the same data file I used in my first post).
      C:\Old_Data\perlp>perl t33.pl fasta.txt Insert query: ttag Results for fasta.txt TTAG compare 1 volte in posizione 108 with ID >chr1 TTAG does not appear in the sequence with ID >chrM C:\Old_Data\perlp>perl t33.pl fasta.txt Insert query: aaaaa Results for fasta.txt AAAAA compare 3 volte in posizione 59 130 131 with ID >chr1 AAAAA does not appear in the sequence with ID >chrM C:\Old_Data\perlp>perl t33.pl fasta.txt Insert query: tagcgat Results for fasta.txt TAGCGAT does not appear in the sequence with ID >chr1 TAGCGAT does not appear in the sequence with ID >chrM

       

      And it seems that the position of the oligos found in the sequence are shifted by one, i know that we count from zero but i can't figure out how to change it

      This code push @pos, $-[0] + 1; will give the position as though counting from 1 (instead of 0). It adds 1 to the offset of the beginning of the pattern match.

      The @- special variable can be found here Variables related to regular expressions. Scroll down to •@LAST_MATCH_START.

      It says

      $-[0] is the offset of the start of the last successful match. $-[n] is the offset of the start of the substring matched by n-th subpattern, or undef if the subpattern did not match.
        What can i say, it works perfectly!

        It'll take some time to understand every line of the script (i had never seen 'if eof'), but you helped so much!

        Moreover i learned how to format my post at the best... thank you for your kind help!

        Best regards, Cristoforo Nikkk
        Hi Cristoforo, sorry to bother you again, but i can't understand some lines in your program...how magically does the program to return the %s e %d values?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1149441]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2024-04-26 04:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found