Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Search string giving incorrect results

by Bishma (Beadle)
on Apr 16, 2002 at 22:01 UTC ( [id://159642]=perlquestion: print w/replies, xml ) Need Help??

Bishma has asked for the wisdom of the Perl Monks concerning the following question:

Ok, here's one of those "what am I doing wrong" questions that I'm sure has an answer so simple I'm just over looking it.

I'm searching a flatfile database with the code:
if ($thread_data[4] =~ m/$in{for}/i) { print $thread_data[0]; }
where $thread_data[4] is what I'm seaching in, $in{for} is what I'm seaching for, and thread_data[0] is an index number telling me what thread returned true. This returns results, but it often returns results that don't contain any part of $in{for} string.

So my question is, what am I getting these erronious results?

Thanks in advance

Replies are listed 'Best First'.
Re: Search string giving incorrect results
by Chmrr (Vicar) on Apr 16, 2002 at 22:22 UTC

    I can't say much without having examples of data which produce these "erronious results," but I've two guesses:

    1. $in{for} is empty. This means that the regex is using whatever the last successful match or replacement was. This is probably not what you want. Postfix $in{for} with .* to avoid this.
    2. $in{for} contains metacharacters. That is, things like . and * which have special meaning to a regular expression. To get around this, use \Q$in{for}\E instead of $in{for} in the regex; or, somewhere above the regex, run $in{for} through quotemeta.

    perl -pe '"I lo*`+$^X$\"$]!$/"=~m%(.*)%s;$_=$1;y^`+*^e v^#$&V"+@( NO CARRIER'

Re: Search string giving incorrect results
by dws (Chancellor) on Apr 16, 2002 at 22:07 UTC
    So my question is, what am I getting these erronious results?

    That's hard to say without seeing some representative data. Show us a value for $in{for}, and @thread[0,4] pair that you expect to match (but doesn't), and a pair that does match that you expect should not.

      Ok, here's (I hope) a little clarification. here's a big snipet of code
      my @dat_files = <$board_dir/*.dat>; $q = 0; foreach (@dat_files) { $number; $number = $_; $number =~ s/\/var\/www\/cgi-bin\/2930forum\/data\///g; $number =~ s/\.dat//g; open THREAD, "$_" or die "Can't open .dat file: $!"; $x = 0; while (<THREAD>) { $thread_data[$x] = $_; $x++; } close THREAD; foreach (@thread_data) { @details = split /\|/, $_; if ($details[4] =~ m/\Q$in{for}\E/i) { $found[$q] = $number; $q++; } } }
      It's a little sploppy at this point, but I'm just trying to get valid results at this point, I'll clean it up later.

      $in{for}: is defined my form input (I've been using simple searches like "cheese")
      $thread_data[4]: is the messages posted in every thread. I could potentially contain just about anything except empty.
      $thread_data[0]: I just realized isn't used. Insted it's $number. Which is just a number string denoting the thread number

      I do the search and get results. Some of the threads contain the searchword $in{for} and others do not. For example I put in $in{for} = cheese and get around 30 results containing messages like:
      Hard work pays off after time, but lazyness always pays off now.

      One thing I just noticed is that many of the results are in numerical order. for example I get results like 121, 122, 124, 125, 126, 127, 128, 129, 21, 282 ,321, 343, 344, 345, etc
        This code has a couple of problems. One is that the handling of filenames, particularly the method of extracting a number from them, is highly suspect. Once you think you've extracted $number, try printing both the full filename and $number.

        It looks like you're trying to accumulate a list of thread nubmers that contain matches. Since you say that $thread_data[0] is the same as $number, this might be easier like so:

        my @dat_files = <$bboard/*.dat>; my %found = (); foreach my $file ( @dat_files ) { open(DAT, $file) or die "$file: $!"; while ( <DAT> ) { my @thread_data = split "|"; if ( $thread_data[4] =~ m/\Q$in{$for}\E/i ) { $found{$thread_data[0]}++; } } close(DAT); }
        The keys of %found are now the thread numbers taht contain a match, and the corresponding values are the number of matches.

Re: Search string giving incorrect results
by rbc (Curate) on Apr 16, 2002 at 22:18 UTC
    Maybe you need to do something like ...
    my $IN = quotemeta($in); if ($thread_data[4] =~ m/$IN{for}/i) { print $thread_data[0]; }
Re: Search string giving incorrect results
by TheHobbit (Pilgrim) on Apr 16, 2002 at 22:15 UTC
    Hi,

    I can't see nothing wrong in the snipet... Maybe the problem lays in how the @thread_data array is filled? are you shure that when you test the next thread_data _all_ @thread_data elements have changed?


    Leo TheHobbit
    GED/CS d? s-:++ a+ C++ UL+++ P+++>+++++ E+ W++ N+ o K? !w O? M V PS+++
    PE-- Y+ PPG+ t++ 5? X-- R+ tv+ b+++ DI? D G++ e*(++++) h r++ y+++(*)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://159642]
Approved by dws
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2024-04-26 00:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found