gogoglou has asked for the wisdom of the Perl Monks concerning the following question:
dear perl monks. I have the following piece of code which does something really trivial but I do not get any results. So what I want to do is first of all I have one array of keywords and one array of sentences. I want to print the keyword and the sentence if somewhere in the sentence the keyword appears. It was working with one line in each file but as soon as I introduced a second sentence it stopped working. I think that I am doing something really stupidly wrong. Please be kind enough to help me out.
Thanks in advance for any help
#!/usr/bin/perl
my $filename1 = $ARGV[0];
my $filename2 = $ARGV[1];
open(INPUT, $filename1);
open(INPUT2,$filename2);
my $line;
my @fields1;
my $fields1;
my $line2;
my @fields2;
my $fields2;
my $i=0;
my $line;
while ($line =<INPUT>){
chomp $line;
$i++;
push (@fields1, $line);
}
while ($line2 =<INPUT2>){
chomp $line2;
push (@fields2, $line2);
}
foreach $fields1(@fields1){
foreach $fields2(@fields2){
# print "$fields1\n";
# print "$fields2\n";
if ($fields2 =~ m/$fields1/){
print "success\n";
}
}
}
# foreach $fields1(@fields1){
# if ($line2 =~m/$fields1/){
# print "$line2\n";
# }
# }
# }
# foreach $fields1(@fields1){
# print "$fields1\n";
# }
#
Re: probably stupid mistake
by toolic (Bishop) on Apr 01, 2011 at 13:41 UTC
|
Since you didn't show us any of your input, we can not reproduce your problem very easily (if at all). Please add a few relevant lines of your input files.
| [reply] [d/l] |
|
GRB2
AFF4
GGGB
C7orf42
and of the second file like these
KEGG_PATHWAY hsa04722:Neurotrophin signaling pathway 33 0.147
+86271171251902 4,04E+09 GRB2, CAMK2G, NFKB1, MAPKAPK2, MAGED1,
+KRAS, MAP3K3, MAP3K1, BCL2, RAC1, GAB1, RHOA, CAMK2D, SH2B3, NGFRAP1,
+ SHC1, MAP2K7, FRS2, PIK3R1, IRS2, RELA, YWHAB, YWHAE, TP73, NRAS, MA
+PK1, CRKL, JUN, NTRK2, CALM3, MAPK9, SORT1, RAP1A 575 124 50
+85 2.353.506.311.360.440 6,83E+11 6,83E+11 0.004929324234
+03758
KEGG_PATHWAY hsa05200:Pathways in cancer 63 0.2822833587239
+ 1,86E+10 FGF18, FGF5, PPARD, FGF9, NFKB1, FGF13, PTEN, CCNE2, AC
+VR1B, MAX, CDKN2B, RHOA, RALB, RALA, FAS, TPR, CHUK, RELA, RXRA, LEF1
+, CDK6, RB1, STK4, DAPK1, JUP, MAPK1, CCDC6, CRKL, HIF1A, NCOA4, JUN,
+ VEGFA, MAPK9, TRAF1, WNT5A, APC2, XIAP, GRB2, IGF1R, KRAS, ITGAV, BC
+L2, RAC1, RUNX1, PIK3R1, APC, CEBPA, COL4A4, FZD8, COL4A1, IL8, VHL,
+CREBBP, TGFBR2, SMAD2, FZD5, STAT3, NRAS, RASSF5, CDKN1B, ITGA6, RASS
+F1, JAK1 575 328 5085 16.985.949.098.621.400 0.0031423
+34832759053 0.0010485440045103767 0.02269450903600312
KEGG_PATHWAY hsa04310:Wnt signaling pathway 35 0.156824088179
+94445 4,76E+10 WNT5A, PPARD, APC2, PPP2R5A, CAMK2G, PPP2R5C, PP
+P3R1, RAC1, NFAT5, PPP3CB, RHOA, CAMK2D, FRAT2, PRKACA, CHP, PLCB1, F
+BXW11, APC, FZD8, VANGL1, NLK, VANGL2, CREBBP, LEF1, SMAD2, FZD5, SEN
+P2, SFRP5, SFRP1, CCND2, CSNK1E, JUN, MAPK9, SIAH1, PPP2R5E 575
+ 151 5085 20.498.128.419.234.000 0.008008272140318184 0.0
+02008108692225119 0.05796850410908494
| [reply] [d/l] [select] |
|
When I run your code using your 2 input files, I get the following output:
success
success
You need to describe in much more detail what problem you are having. | [reply] [d/l] |
|
|
|
|
Re: probably stupid mistake
by Popcorn Dave (Abbot) on Apr 01, 2011 at 20:26 UTC
|
The other monks have given you great advice but you might also look in to using the graphical Perl debugger PTKDB
Using that has solved numerous problems I've had since you can step through the code and watch what's happening with your variables and how they change.
To disagree, one doesn't have to be disagreeable - Barry Goldwater
| [reply] |
Re: probably stupid mistake
by GrandFather (Saint) on Apr 02, 2011 at 22:53 UTC
|
#!/usr/bin/perl
use strict;
use warnings;
my $file1 = <<FILE1;
GRB2
AFF4
GGGB
C7orf42
FILE1
my $file2 = <<FILE2;
KEGG_PATHWAY hsa04722:Neurotrophin signaling pathway 33 0.147
+86271171251902 4,04E+09 GRB2, CAMK2G, NFKB1, MAPKAPK2, MAGED1,
+KRAS, MAP3K3, MAP3K1, BCL2, RAC1, GAB1, RHOA, CAMK2D, SH2B3, NGFRAP1,
+ SHC1, MAP2K7, FRS2, PIK3R1, IRS2, RELA, YWHAB, YWHAE, TP73, NRAS, MA
+PK1, CRKL, JUN, NTRK2, CALM3, MAPK9, SORT1, RAP1A 575 124 50
+85 2.353.506.311.360.440 6,83E+11 6,83E+11 0.004929324234
+03758
KEGG_PATHWAY hsa05200:Pathways in cancer 63 0.2822833587239
+ 1,86E+10 FGF18, FGF5, PPARD, FGF9, NFKB1, FGF13, PTEN, CCNE2, AC
+VR1B, MAX, CDKN2B, RHOA, RALB, RALA, FAS, TPR, CHUK, RELA, RXRA, LEF1
+, CDK6, RB1, STK4, DAPK1, JUP, MAPK1, CCDC6, CRKL, HIF1A, NCOA4, JUN,
+ VEGFA, MAPK9, TRAF1, WNT5A, APC2, XIAP, GRB2, IGF1R, KRAS, ITGAV, BC
+L2, RAC1, RUNX1, PIK3R1, APC, CEBPA, COL4A4, FZD8, COL4A1, IL8, VHL,
+CREBBP, TGFBR2, SMAD2, FZD5, STAT3, NRAS, RASSF5, CDKN1B, ITGA6, RASS
+F1, JAK1 575 328 5085 16.985.949.098.621.400 0.0031423
+34832759053 0.0010485440045103767 0.02269450903600312
KEGG_PATHWAY hsa04310:Wnt signaling pathway 35 0.156824088179
+94445 4,76E+10 WNT5A, PPARD, APC2, PPP2R5A, CAMK2G, PPP2R5C, PP
+P3R1, RAC1, NFAT5, PPP3CB, RHOA, CAMK2D, FRAT2, PRKACA, CHP, PLCB1, F
+BXW11, APC, FZD8, VANGL1, NLK, VANGL2, CREBBP, LEF1, SMAD2, FZD5, SEN
+P2, SFRP5, SFRP1, CCND2, CSNK1E, JUN, MAPK9, SIAH1, PPP2R5E 575
+ 151 5085 20.498.128.419.234.000 0.008008272140318184 0.0
+02008108692225119 0.05796850410908494
FILE2
open my $in1, '<', \$file1;
my @keyWords;
while (defined (my $line = <$in1>)) {
chomp $line;
push @keyWords, $line;
}
close $in1;
die "No keywords specified\n" if ! @keyWords;
my $match = "\\b\Q" . join ("\E\\b|\b\Q", @keyWords) . "\E\\b";
open my $in2, '<', \$file2;
while (defined (my $line = <$in2>)) {
next if $line !~ /($match)/;
print "Matched $1 in line $.\n";
}
Prints:
Matched GRB2 in line 1
Matched GRB2 in line 2
Some stuff to notice:
- Use inline data for sample or test code so you don't have to deal with external files.
- Use three parameter open and lexical file handles.
- Don't slurp files into an array then loop over the array.
- Avoid nested loops.
- Keep variable declarations local to where they are used.
- Always use strictures (use strict; use warnings;).
- Use meaningful variable names
- Remove unused code and variables ($i was not used and was redundant in any case).
True laziness is hard work
| [reply] [d/l] [select] |
|
|