Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

probably stupid mistake

by gogoglou (Beadle)
on Apr 01, 2011 at 13:33 UTC ( [id://896920]=perlquestion: print w/replies, xml ) Need Help??

gogoglou has asked for the wisdom of the Perl Monks concerning the following question:

dear perl monks. I have the following piece of code which does something really trivial but I do not get any results. So what I want to do is first of all I have one array of keywords and one array of sentences. I want to print the keyword and the sentence if somewhere in the sentence the keyword appears. It was working with one line in each file but as soon as I introduced a second sentence it stopped working. I think that I am doing something really stupidly wrong. Please be kind enough to help me out. Thanks in advance for any help

#!/usr/bin/perl my $filename1 = $ARGV[0]; my $filename2 = $ARGV[1]; open(INPUT, $filename1); open(INPUT2,$filename2); my $line; my @fields1; my $fields1; my $line2; my @fields2; my $fields2; my $i=0; my $line; while ($line =<INPUT>){ chomp $line; $i++; push (@fields1, $line); } while ($line2 =<INPUT2>){ chomp $line2; push (@fields2, $line2); } foreach $fields1(@fields1){ foreach $fields2(@fields2){ # print "$fields1\n"; # print "$fields2\n"; if ($fields2 =~ m/$fields1/){ print "success\n"; } } } # foreach $fields1(@fields1){ # if ($line2 =~m/$fields1/){ # print "$line2\n"; # } # } # } # foreach $fields1(@fields1){ # print "$fields1\n"; # } #

Replies are listed 'Best First'.
Re: probably stupid mistake
by toolic (Bishop) on Apr 01, 2011 at 13:41 UTC

      oh sorry, well the contents of the first file looks like this (its going to be larger, not that it should matter):

      GRB2 AFF4 GGGB C7orf42

      and of the second file like these

      KEGG_PATHWAY hsa04722:Neurotrophin signaling pathway 33 0.147 +86271171251902 4,04E+09 GRB2, CAMK2G, NFKB1, MAPKAPK2, MAGED1, +KRAS, MAP3K3, MAP3K1, BCL2, RAC1, GAB1, RHOA, CAMK2D, SH2B3, NGFRAP1, + SHC1, MAP2K7, FRS2, PIK3R1, IRS2, RELA, YWHAB, YWHAE, TP73, NRAS, MA +PK1, CRKL, JUN, NTRK2, CALM3, MAPK9, SORT1, RAP1A 575 124 50 +85 2.353.506.311.360.440 6,83E+11 6,83E+11 0.004929324234 +03758 KEGG_PATHWAY hsa05200:Pathways in cancer 63 0.2822833587239 + 1,86E+10 FGF18, FGF5, PPARD, FGF9, NFKB1, FGF13, PTEN, CCNE2, AC +VR1B, MAX, CDKN2B, RHOA, RALB, RALA, FAS, TPR, CHUK, RELA, RXRA, LEF1 +, CDK6, RB1, STK4, DAPK1, JUP, MAPK1, CCDC6, CRKL, HIF1A, NCOA4, JUN, + VEGFA, MAPK9, TRAF1, WNT5A, APC2, XIAP, GRB2, IGF1R, KRAS, ITGAV, BC +L2, RAC1, RUNX1, PIK3R1, APC, CEBPA, COL4A4, FZD8, COL4A1, IL8, VHL, +CREBBP, TGFBR2, SMAD2, FZD5, STAT3, NRAS, RASSF5, CDKN1B, ITGA6, RASS +F1, JAK1 575 328 5085 16.985.949.098.621.400 0.0031423 +34832759053 0.0010485440045103767 0.02269450903600312 KEGG_PATHWAY hsa04310:Wnt signaling pathway 35 0.156824088179 +94445 4,76E+10 WNT5A, PPARD, APC2, PPP2R5A, CAMK2G, PPP2R5C, PP +P3R1, RAC1, NFAT5, PPP3CB, RHOA, CAMK2D, FRAT2, PRKACA, CHP, PLCB1, F +BXW11, APC, FZD8, VANGL1, NLK, VANGL2, CREBBP, LEF1, SMAD2, FZD5, SEN +P2, SFRP5, SFRP1, CCND2, CSNK1E, JUN, MAPK9, SIAH1, PPP2R5E 575 + 151 5085 20.498.128.419.234.000 0.008008272140318184 0.0 +02008108692225119 0.05796850410908494
        When I run your code using your 2 input files, I get the following output:
        success success
        You need to describe in much more detail what problem you are having.
Re: probably stupid mistake
by Popcorn Dave (Abbot) on Apr 01, 2011 at 20:26 UTC
    The other monks have given you great advice but you might also look in to using the graphical Perl debugger PTKDB

    Using that has solved numerous problems I've had since you can step through the code and watch what's happening with your variables and how they change.


    To disagree, one doesn't have to be disagreeable - Barry Goldwater

Re: probably stupid mistake
by GrandFather (Saint) on Apr 02, 2011 at 22:53 UTC

    Try running the following sample code and see if it is close to what you want to achieve.

    #!/usr/bin/perl use strict; use warnings; my $file1 = <<FILE1; GRB2 AFF4 GGGB C7orf42 FILE1 my $file2 = <<FILE2; KEGG_PATHWAY hsa04722:Neurotrophin signaling pathway 33 0.147 +86271171251902 4,04E+09 GRB2, CAMK2G, NFKB1, MAPKAPK2, MAGED1, +KRAS, MAP3K3, MAP3K1, BCL2, RAC1, GAB1, RHOA, CAMK2D, SH2B3, NGFRAP1, + SHC1, MAP2K7, FRS2, PIK3R1, IRS2, RELA, YWHAB, YWHAE, TP73, NRAS, MA +PK1, CRKL, JUN, NTRK2, CALM3, MAPK9, SORT1, RAP1A 575 124 50 +85 2.353.506.311.360.440 6,83E+11 6,83E+11 0.004929324234 +03758 KEGG_PATHWAY hsa05200:Pathways in cancer 63 0.2822833587239 + 1,86E+10 FGF18, FGF5, PPARD, FGF9, NFKB1, FGF13, PTEN, CCNE2, AC +VR1B, MAX, CDKN2B, RHOA, RALB, RALA, FAS, TPR, CHUK, RELA, RXRA, LEF1 +, CDK6, RB1, STK4, DAPK1, JUP, MAPK1, CCDC6, CRKL, HIF1A, NCOA4, JUN, + VEGFA, MAPK9, TRAF1, WNT5A, APC2, XIAP, GRB2, IGF1R, KRAS, ITGAV, BC +L2, RAC1, RUNX1, PIK3R1, APC, CEBPA, COL4A4, FZD8, COL4A1, IL8, VHL, +CREBBP, TGFBR2, SMAD2, FZD5, STAT3, NRAS, RASSF5, CDKN1B, ITGA6, RASS +F1, JAK1 575 328 5085 16.985.949.098.621.400 0.0031423 +34832759053 0.0010485440045103767 0.02269450903600312 KEGG_PATHWAY hsa04310:Wnt signaling pathway 35 0.156824088179 +94445 4,76E+10 WNT5A, PPARD, APC2, PPP2R5A, CAMK2G, PPP2R5C, PP +P3R1, RAC1, NFAT5, PPP3CB, RHOA, CAMK2D, FRAT2, PRKACA, CHP, PLCB1, F +BXW11, APC, FZD8, VANGL1, NLK, VANGL2, CREBBP, LEF1, SMAD2, FZD5, SEN +P2, SFRP5, SFRP1, CCND2, CSNK1E, JUN, MAPK9, SIAH1, PPP2R5E 575 + 151 5085 20.498.128.419.234.000 0.008008272140318184 0.0 +02008108692225119 0.05796850410908494 FILE2 open my $in1, '<', \$file1; my @keyWords; while (defined (my $line = <$in1>)) { chomp $line; push @keyWords, $line; } close $in1; die "No keywords specified\n" if ! @keyWords; my $match = "\\b\Q" . join ("\E\\b|\b\Q", @keyWords) . "\E\\b"; open my $in2, '<', \$file2; while (defined (my $line = <$in2>)) { next if $line !~ /($match)/; print "Matched $1 in line $.\n"; }

    Prints:

    Matched GRB2 in line 1 Matched GRB2 in line 2

    Some stuff to notice:

    1. Use inline data for sample or test code so you don't have to deal with external files.
    2. Use three parameter open and lexical file handles.
    3. Don't slurp files into an array then loop over the array.
    4. Avoid nested loops.
    5. Keep variable declarations local to where they are used.
    6. Always use strictures (use strict; use warnings;).
    7. Use meaningful variable names
    8. Remove unused code and variables ($i was not used and was redundant in any case).
    True laziness is hard work

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://896920]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-04-23 22:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found