Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^3: Question: homemade blat: insert query from keyboard to find an oligo in an hashtable

by Cristoforo (Curate)
on Dec 05, 2015 at 18:03 UTC ( [id://1149470]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Question: homemade blat: insert query from keyboard to find an oligo in an hashtable
in thread Question: homemade blat: insert query from keyboard to find an oligo in an hashtable

It's possible to search for a query not the length used to create the index (for example a query >n or a multiple of n) without recreating the whole hashtable?
This program does it.

Update: Changed program so it finds matches in while loop - no need to store data. Process it as you read it.

#!/usr/bin/perl use strict; use warnings; # 'shift' removes items from @ARGV my $nomefile = shift; open my $fh, '<', $nomefile or die "Unable to open $nomefile. $!"; print "Insert query:\n"; chomp(my $query = uc <STDIN>); my $sequenza; chomp(my $id = <$fh>); # reads first line ('should' be ID) print "\nResults for $nomefile\n\n"; while (my $line = <$fh>) { chomp $line; if (substr($line, 0, 1) eq '>') { find_matches($id, $sequenza, $query); $sequenza = ''; $id = $line; } else { $sequenza .= uc $line; } find_matches($id, $sequenza, $query) if eof; } close $fh or die "Unable to close $nomefile. $!"; sub find_matches { my ($id, $sequenza, $query) = @_; my @pos; while ($sequenza =~ /(?=$query)/g) { push @pos, $-[0] + 1; } if (@pos) { printf "%s compare %d volte in posizione @pos with ID %s\n", $query, scalar @pos, $id; } else { print "$query does not appear in the sequence with ID $id\n"; } }
For 3 runs of the program (using the same data file I used in my first post).
C:\Old_Data\perlp>perl t33.pl fasta.txt Insert query: ttag Results for fasta.txt TTAG compare 1 volte in posizione 108 with ID >chr1 TTAG does not appear in the sequence with ID >chrM C:\Old_Data\perlp>perl t33.pl fasta.txt Insert query: aaaaa Results for fasta.txt AAAAA compare 3 volte in posizione 59 130 131 with ID >chr1 AAAAA does not appear in the sequence with ID >chrM C:\Old_Data\perlp>perl t33.pl fasta.txt Insert query: tagcgat Results for fasta.txt TAGCGAT does not appear in the sequence with ID >chr1 TAGCGAT does not appear in the sequence with ID >chrM

 

And it seems that the position of the oligos found in the sequence are shifted by one, i know that we count from zero but i can't figure out how to change it

This code push @pos, $-[0] + 1; will give the position as though counting from 1 (instead of 0). It adds 1 to the offset of the beginning of the pattern match.

The @- special variable can be found here Variables related to regular expressions. Scroll down to •@LAST_MATCH_START.

It says

$-[0] is the offset of the start of the last successful match. $-[n] is the offset of the start of the substring matched by n-th subpattern, or undef if the subpattern did not match.
  • Comment on Re^3: Question: homemade blat: insert query from keyboard to find an oligo in an hashtable
  • Select or Download Code

Replies are listed 'Best First'.
Re^4: Question: homemade blat: insert query from keyboard to find an oligo in an hashtable
by nikkk (Novice) on Dec 06, 2015 at 10:10 UTC
    What can i say, it works perfectly!

    It'll take some time to understand every line of the script (i had never seen 'if eof'), but you helped so much!

    Moreover i learned how to format my post at the best... thank you for your kind help!

    Best regards, Cristoforo Nikkk
Re^4: Question: homemade blat: insert query from keyboard to find an oligo in an hashtable
by nikkk (Novice) on Dec 10, 2015 at 16:44 UTC
    Hi Cristoforo, sorry to bother you again, but i can't understand some lines in your program...how magically does the program to return the %s e %d values?
      To understand how 'printf' uses the %s %d conversions, see sprintf and look for the sentence in that section titled

      Perl's sprintf permits the following universally-known conversions:

      printf uses the same conversions as sprintf.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1149470]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-04-19 06:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found