Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Nested loops?

by Speed_Freak (Sexton)
on Aug 17, 2017 at 19:18 UTC ( [id://1197574]=perlquestion: print w/replies, xml ) Need Help??

Speed_Freak has asked for the wisdom of the Perl Monks concerning the following question:

I am working with existing code, and trying to add a "filter" to it. Currently the code pulls an id number and a sequence from a table in the database.(sql1) A foreach loop then permutes each sequence and searches the list to find out if any of the alternates exists. This leads to way more matches than I need because the loop eventually gets to the existing alternates, permutes them, and finds all of the matches again, just with a different primary sequence. I created a second database pull that creates another list with the same id-sequence layout that contains only the primary keys that I want to evaluate. (sql2) I want to use this secondary list as the filter for which sequences are evaluated from the primary list, but I need each key identified in the secondary list to be evaluated against all of the primary list.

#currently have strict turned off #code snippet foreach my $sql1 (@{$sql1}) { $table1{$sql1->[1]}{$sql1->[0]}=undef; #rearranges the sql pull } foreach my $sql2 (@{$sql2}) { $table2{$sql2->[1]}{$sql2->[0]}=undef; #rearranges the sql pull } my %hash = (); my @array = (); my @bases = ('A','C','G','T'); foreach my $tar1 (keys %table1){ foreach my $tar2 (keys %table2) { if ($tar1 eq $tar2) { #a bunch of follow on code that works if the second foreach and if sta +tements are removed

I'm just not sure which direction I should go with trying to limit the list it chooses to evaluate, without limiting the list of sequences it uses to evaluate against. I have tried several combinations of foreach/if/where statements and the closest I have gotten lead me to loop through the entire first table, but only using one sequence from the second table. I couldn't get it to iterate through the "filter" table. I'm sure my explanation is lacking severely.

Replies are listed 'Best First'.
Re: Nested loops?
by stevieb (Canon) on Aug 17, 2017 at 19:27 UTC

    If I'm understanding you correctly, exists to the rescue!

    use warnings; use strict; my %one = (a => 1, b => 2, c => 3); # first hash my %two = (a => 1, c => 3); # second hash for my $x (keys %two){ if (exists $one{$x}){ print "hash \$two key $x exists in hash \$one\n"; } }

    Output:

    hash $two key c exists in hash $one hash $two key a exists in hash $one

    In other words, it iterates over the second hash, checking if the "filter" key is in the first hash allowing you to perform some actions, otherwise the loop will just skip to the next iteration. A major benefit here is that it only iterates over a single hash... the smallest one, which completely avoids looping over one hash in its entirety, then a second entire hash for every key in the first.

      I'm currently trying out exists, thanks for that! In the example above, I would have to put the "filter" keys in the exists, because the $x becomes the sequence that gets augmented and compared to all $x's. So, if the key in %two exists as a key in %one, then the key from %one becomes the target. Which then gets assigned to a new variable with each option from the base list. Those are then checked against the $x keys for matches.

      foreach my $sql1 (@{$sql1}) { $table1{$sql1->[1]}{$sql1->[0]}=undef; #rearranges the sql pull - + large table of sequences and id's - These get used to create the alt +s and house the entire list that needs to be searched } foreach my $sql2 (@{$sql2}) { $table2{$sql2->[1]}{$sql2->[0]}=undef; #rearranges the sql pull - + "filter" table of sequences and id's - Only used to dictate which se +quences get used from the large table my @bases = ('A','C','G','T'); foreach my $x (keys %table1){ if (exists $table2 ({$x})) { my $found_alt = 0; foreach my $bases (@bases) { my $alt = $x; substr($alt, 20, 1) = $opt; next if ($alt eq $x); if (exists $table1{$alt}) { $found_alt = 1;
Re: Nested loops?
by kcott (Archbishop) on Aug 18, 2017 at 08:09 UTC

    G'day Speed_Freak,

    Firstly, there's all sorts of problems with your post:

    • No sample input data.
    • No expected output.
    • Incomplete code: it won't run; we can't test.
    • Use of identically named variables for different things, e.g. $sql1.
    • Declaration and initialisation of variables that are not subsequently used, e.g. @bases.

    Your follow-up response suffers from much the same problems. Please read "How do I post a question effectively?". Aim to provide us with an SSCCE.

    In general, you should just create a single hash from the smaller dataset (your seq2?); then iterate the larger dataset (your seq1?) processing this raw data based on the single hash created. The following code demonstrates the technique:

    #!/usr/bin/env perl -l use strict; use warnings; my $all_aref = [ [qw{id1 seq1}], [qw{id2 seq2}], [qw{id3 seq4}] ]; my $filter_aref = [ [qw{id1 seq1}], [qw{id3 seq3}] ]; my %filter_hash; $filter_hash{$_->[0]}{$_->[1]} = 1 for @$filter_aref; for (@$all_aref) { if (exists $filter_hash{$_->[0]}) { print "ID $_->[0] in filter"; if (exists $filter_hash{$_->[0]}{$_->[1]}) { print "SEQ $_->[1] in filter for ID $_->[0]"; } else { print "SEQ $_->[1] not in filter for ID $_->[0]"; } } else { print "ID $_->[0] not in filter"; } }

    Output:

    ID id1 in filter SEQ seq1 in filter for ID id1 ID id2 not in filter ID id3 in filter SEQ seq4 not in filter for ID id3

    Note that I strongly emphasised "demonstrates the technique" because this is not intended to be any sort of solution. Not knowing what the input looks like, how it should be processed, or what sort of output is required, a solution is not possible at this time!

    — Ken

Re: Nested loops?
by Laurent_R (Canon) on Aug 18, 2017 at 06:19 UTC
    It is quite difficult to understand your code because we have no idea of the contents of $sql1 and $sql2.

    It would be very useful if you provided sample input data for those, much in the way stevieb dit it in his answer.

Re: Nested loops?
by chacham (Prior) on Aug 18, 2017 at 13:30 UTC

    Why are you doing a SQL task in perl? Just let the database do the whole thing for you. It's faster, an will save on network traffic.

    Please post the queries so we can have a look at combining them.

Re: Nested loops?
by zakame (Pilgrim) on Aug 23, 2017 at 17:24 UTC

    I only skimmed through this, but for that first foreach, you should have something %seen in place just before entering the loop, so it takes id/sequence as a key to skip alternates in the loop (next if %seen{$sequence}.)

    Also, I'm probably wrong, but the way you describe your process sounds like a gather/take from Perl6. Here's some Perl5-ish implementation using Syntax::Keyword::Gather:

    use Syntax::Keyword::Gather; my @sequences = ( ... ); my @filters = ( ... ); my @primary_and_filtered = gather { my %seen; for my $seq (@sequences) { take $seq unless $seen{$seq}; take map { $_->($seq) } @filters; $seen{$seq}++; } };

    Note that the @filters doesn't correspond to your described filters list of sequences, but rather, a list of filter functions (e.g. another permute, or a more specific search, etc.) to evaluate your original primary sequence against.

      Thanks for the response! I was looking into how %seen works and realized where my code was wrong

      I added a line above the loop: my %table2a = keys %table2 and then changed my if statement: if (exists ($table2a {$x}))

      Now it's working just as expected. You'll also notice that the parentheses in the if statement changed as well.

        Happy it helped :) Enjoy PerlMonks!

Re: Nested loops?
by Speed_Freak (Sexton) on Aug 18, 2017 at 17:14 UTC

    Alright, let's see if I can come up with a better explanation... The working code was written by someone else, so there are a couple of things (a lot of things) I don't quite understand... Like in the foreach statement that defines my $sql1 (@{$sql1}). earlier in the script, $sql1 is defined as my $sql1 = $lib_dbh->selectall_arrayref($pull1). $pull1 is the select statement for SQLite. The second select statement ($sql2) uses two other tables to match elements and create a list of "target" sequences.

    The data stored in the first $sql1 ($lib_dbh->selectall_arrayref($pull1)) is as such:

    table 1

    55436, atcgtggtcgtgt
    56875, agtcgtagtctaa
    56789, tgatgcgtctatc
    23698, atcgtgctcgtgt
    75699, tgatgcttctatc
    87226, atcgtgatcgtgt
    12214, agtcgttgtctaa
    etc.

    The data in the second table would be the same, except with only the filtered target sequences.

    table2

    55436, atcgtggtcgtgt
    56875, agtcgtagtctaa
    56789, tgatgcgtctatc
    etc.

    The foreach loops containing "$table1{$sql1->[1]}{$sql1->[0]}=undef;" then rearranges the tables to have the sequence first, and the id's second. (I don't know why, but that's the way it is set up. I have to work within the constraints of the original programmer so as not to break any of the follow on scripts.)

    The output ends up being an array (with much more columns from the rest of the script)containing id's in [1] and [2].

    Using the example data, I would want it to start with the sequence "atcgtggtcgtgt" from table 2. (id 55436.) It would then look through table 1 to see if it exists. (It does, and always will since it is a subset.) It then takes the sequence from table 1 and augments it with the 4 bases (my @bases = ('A','C','G','T');) at the 7th position (substr($alt, 6, 1) = $opt;).

    It skips the augmented sequence that matches the original sequence, and then iterates over table 1 for the remaining three sequences. (atcgtgAtcgtgt, atcgtgCtcgtgt, atcgtgTtcgtgt). (I capitalized them for emphasis only.) Each time it finds a match, it stores the id to [2] in the array. [1] holds the original id.

    For the example dataset, the array output would be something like:
    [0]1 [1]55436 [2] 23698, 87226
    [0]2 [1]56875 [2] 12214
    [0]3 [1]56789 [2] 75699

      To avoid reading the whole table into a hash, consider using substrings within the SQL to match the sequences. For example

      #!/usr/bin/perl use strict; use DBI; use Data::Dumper; my $n = 6; unlink 'mytestdb.sqlite' if -e 'mytestdb.sqlite'; my $dbh = DBI->connect("dbi:SQLite:dbname=mytestdb.sqlite","",""); test_setup(); my $sql2 = " SELECT id,seq,substr(seq,1,$n),substr(seq,-$n) FROM testtable WHERE id IN ('55436','56875','56789')"; my $ar = $dbh->selectall_arrayref($sql2); my $sql3 = " SELECT id FROM testtable WHERE substr(seq,1,$n) = ? AND substr(seq,-$n) = ? AND id != ?"; my $sth3 = $dbh->prepare($sql3); my @output = (); my $i=0; for my $rec (@$ar){ $sth3->execute($rec->[2],$rec->[3],$rec->[0]); my $others = join ',', map { $_->[0] } @{ $sth3->fetchall_arrayref() }; push @output,[++$i,$rec->[0],$others]; } print Dumper \@output; sub test_setup { $dbh->do('CREATE TABLE testtable (id,seq)'); my $sth = $dbh->prepare('INSERT INTO testtable VALUES (?,?)'); while (<DATA>){ chomp; my @f = split ", ",$_; $sth->execute(@f); } } __DATA__ 55436, atcgtggtcgtgt 56875, agtcgtagtctaa 56789, tgatgcgtctatc 23698, atcgtgctcgtgt 75699, tgatgcttctatc 87226, atcgtgatcgtgt 12214, agtcgttgtctaa
      poj

      Again, you post a snippet which doesn't compile, and which, if it would, doesn't help me to help you, since it depends on a datasource unavailable to me.

      The data initialization stuff isn't interesting, so you could just skip that, and provide a representative subset of the anonymous hashes $slq1 and $sql2 (since that is what is relevant here), at best in a format Data::Dumper or related modules provide. Then, the foreach loop labeled with Label isn't finsished, and there's no code which does the transformation of @storage_array into the desired output you post.

      So, again, I have to guess. Why do you provide the necessary information needed to help you just piecemeals? See I know what I mean. Why don't you?

      perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1197574]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2024-04-25 23:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found