Separating multiple keyword search input

Dente has asked for the wisdom of the Perl Monks concerning the following question:

I need to know if there is a efficient way to differentiate keyword search results between 2 flatdbs. I have been blessed to have jarich help split filehandles and assign them to arrays to hold data until needed later. However, earlier on I find I need to differentiate search results.

These are my placeholders

    $main_template =~ s/%%keywords%%/$fields{'keywords'}/g;
    $main_template =~ s/%%searcresults%%/$pitem/g;
    
    $main_template =~ s/%%keywords%%/$fields{'keywords'}/g;
    $main_template =~ s/%%premiumlistings%%/$premiumitem/g;
[download]

I need to differentiate between ~ s/%%keywords%%/$fields{'keywords'}/g; for the %%searcresult%% and %%premiumlistings%%


$fields{'keywords'} = &get_search_ready($fields{'keywords'});

if ($fields{'keywords'} eq ""){$fields{'keywords'} = "NOTHING";}
if (length($fields{'keywords'}) < $minimum_keyword_length){$fields{'ke
+ywords'} = "2_CHARS";}

if (($use_mysql eq "Yes") and ($fields{'keywords'} ne "NOTHING") and (
+$fields{'keywords'} ne "2_CHARS"))
    {
    &mysql_search;
    }
    elsif (($fields{'keywords'} ne "NOTHING") and ($fields{'keywords'}
+ ne "2_CHARS"))
    {
    &normal_search;
    }
    

if (($fields{'keywords'} eq "NOTHING") or ($fields{'keywords'} eq "2_C
+HARS"))
    {
    $fields{'keywords'} = "";
    }


#### RELEVANCE

$rspcntr = 0;


############ CAPTURES ALL KEYWORD INPUT ###############

(@skeyw) = split(/ /,$fields{'keywords'});

########### ADDED 04/29/04 ###################
$nrkeywords = push(@skeyw);
foreach $kwrel (@resultline)
###########################################

    {
    $keycount = 0;
    $kwcntr = 1000;
    while ($keycount < $nrkeywords)
        {
        
          #### TITLE
          (@litems) = split(/\t/,$kwrel);
          if ($litems[1] =~ /$skeyw[$keycount]/i)
              {
              $kwcntr = $kwcntr - 5;
              }
        
          #### DESCRIPTION
          if ($litems[2] =~ /$skeyw[$keycount]/i)
              {
              $kwcntr = $kwcntr - 1;
              }

          #### KEYWORDS
          if ($litems[5] =~ /$skeyw[$keycount]/i)
              {
              $kwcntr = $kwcntr - 4;
              }
        
          ##### GENERAL
          if (($kwrel =~ /$skeyw[$keycount]/i) and ($skeyw[$keycount] 
+ne ""))
            {
            $kwcntr = $kwcntr - 1;
            }

          ##### PHRASE
          if (($kwrel =~ /$fields{'keywords'}/i) and ($nrkeywords > 1)
+)
            {
            $kwcntr = $kwcntr - 10;
            }
        
          ##### AND BOOLEAN MATCHING
          
            if ($nrkeywords == 2)
                {
                if (($kwrel =~ /$skeyw[0]/i) and ($kwrel =~ /$skeyw[1]
+/i))
                    {$kwcntr = $kwcntr - 20;}
                }
            elsif ($nrkeywords == 3)
                {
                if (($kwrel =~ /$skeyw[0]/i) and ($kwrel =~ /$skeyw[1]
+/i) and ($kwrel =~ /$skeyw[2]/i))
                    {$kwcntr = $kwcntr - 30;}
                }
            elsif ($nrkeywords == 4)
                {
                if (($kwrel =~ /$skeyw[0]/i) and ($kwrel =~ /$skeyw[1]
+/i) and ($kwrel =~ /$skeyw[2]/i) and ($kwrel =~ /$skeyw[3]/i))
                    {$kwcntr = $kwcntr - 40;}
                }

        $keycount++;
        }

    if (length($kwcntr) == 3){$kwcntr = "0" . $kwcntr;}
    if (length($kwcntr) == 2){$kwcntr = "00" . $kwcntr;}
    if (length($kwcntr) == 1){$kwcntr = "000" . $kwcntr;}

    $resultline[$rspcntr] = $kwcntr . "-_:" . $kwrel;
[download]

If not, is there a way to add multiple names (inferences) to the search input text field so I may separate 'keywords' from 'keywords2' to differentiate the %%searchresults%% placeholder from the %%premiumlistings%% placeholder doring the same search process?

Example: http://www.urhosted.com/mall/searchres.html - template before populated

then search for "store"

Comment on Separating multiple keyword search input Select or Download Code

Replies are listed 'Best First'.
Re: Separating multiple keyword search input by tkil (Monk) on Apr 30, 2004 at 05:50 UTC
Having seen a couple of your posts, I'd like to make a suggestion before I get to my main response: you will have a lot more luck getting help if you would take the time to reduce your problem to simplest possible terms. For most questions related to language usage, it is rare that you can't illustrate the problem in less than 20 lines. Think about it: the helpful monks have only so much time to offer. Given the choice between a small, succinct tidbit and a long, incoherent, out-of-context, "I'll just dump some code into a question and someone will be able to fully grasp it and tell me what is going wrong" question... which do you think they'll answer? There are resources that describe how to ask a question so that it will get answered; consider availing yourself of them. Anyway. Rant concluded. So far as I can infer from your other postings, you have two sets of data ("normal" and "premium"); you want to accept keywords off a web form, search through those sets for those keywords, then display the results (keeping the results from the two sets distinct). It further looks like that data might be large and stored in a MySQL database. The data is also apparently structured into tab-separated fields of: unknown, title, description, unknown, unknown, keywords. Without knowing your entire system (and I don't want to learn it -- I just want you to think about it), we can start biting off chunks. The two phases of processing you show above are interpreting form parameters, then searching through some result lines (which it's not clear where you got them from.) Finding matches from a file. # find_hits $file_name, @keywords; # # Returns a list of "score: line" strinsg. Example: # # If file "foo.txt" contains: # # Larry Wall, Programming Perl, Reference, x, y, perl # Peter Scott, Perl Medic, Legacy herding, z, a, perl # # A call to this function might look like this: # # my @hits = find_hits "foo.txt", "medic"; # # And it would return a one-element list: # # '32: Peter Scot...a, perl' sub find_hits( $ @ ) { my ( $file, @keywords ) = @_; # compose a regex for quick rejection of non-matching lines: my $any_keyword_re = join '\|', map quotemeta $_, @keywords; $any_keyword_re = qr/$any_keyword_re/i; # and a keyword for the whole phrase my $phrase_re = join '\W+', @keywords; $phrase_re = qr/$phrase_re/i; # open input file for read open my $in, $file or croak "opening $file for read: $!"; my @rv; # for accumulating return values while ( <$in> ) { # reject lines with no matches out of hand next unless m/$any_keyword_re/; # any match at all is one point. my $score = 1; # split into fields for further scoring. my ( undef, $title, $desc, undef, undef, $keys ) = split /\t/; # title matches are worth 5 points each while ( $title =~ m/$any_keyword_re/g ) { $score += 5 } # description matches are only 1 point while ( $desc =~ m/$any_keyword_re/g ) { $score += 1 } # keyword matches are 4 points while ( $keys =~ m/$any_keyword_re/g ) { $score += 4 } # phrase matches (against entire line) are 10 points while ( m/$phrase_re/g ) { $score += 10 } # multiple matches are worth 10x the number # of keywords that matched. my $n_matches = () = m/$any_keyword_re/g; # see perlfaq4 if ( $n_matches > 1 ) { $score += 10*$n_matches } # finally, format $score and save for returning # to the caller push @rv, sprintf "%03d: %s", $score, $_; } return @rv; } [download] Ok, so now we address your other main issue, that of doing a substitution in your template. Since we can have a variable number of responses to each, but you only have one template variable, I'll assume that we can wedge all our answers into that one spot. We'll do this by joining together all the hits we found. # validate our keywords. unless ( defined $fields{keywords} && fields{keywords} ne '' ) { # complain, bail out of this run. } my @keywords = split ' ', $fields{keywords}; # find actual matches in highest- to lowest-score order my @normal_hits = reverse sort find_hits "normal_list.txt", @keywords +; my @premium_hits = reverse sort find_hits "premium_list.txt", @keyword +s; # keep lists reasonable if ( @normal_hits > 100 ) { splice @normal_hits, 100 } if ( @premium_hits > 100 ) { splice @premium_hits, 100 } # now join them together for presentation: my $normal_hits = join '<br />', @normal_hits; my $premium_hits = join '<br />', @premium_hits; # finally, do the now-obvious substitution: $template =~ s/%%normalresults%%/$normal_hits/; $template =~ s/%%premiumresults%%/$premium_hits/; [download] Hopefully this gets you closer. There are all sorts of places that this code might need tweaking: if there are possibly a huge number of hits, you can't store them all in memory, so you'll need to keep track of just the top 100 (or whatever) the whole way. Handling special characters in the return strings for printing to HTML. How to handle more advanced CSV files (e.g. double quotes protecting a comma that is in a field.) None of these are insurmountable, but until and unless you develop the skill to break down problems into simpler, more digestable chunks (both for asking questions and for writing the solution in the first place), no amount of cookbookery will help you. Good luck.	[reply] [d/l] [select]
Re: Re: Separating multiple keyword search input by Dente (Acolyte) on May 03, 2004 at 23:44 UTC
This definetly gets me closer to the solution. I apologize for submitting code from mid point as I thought that is where my problem was initially, however, upon further disection, I feel the problem needs to be addressed early on. Here is code I thought would have worked for me, kepping in mind that I want a site search engine returning the result-like appearance of google. `open (SIDX, "$data_dir/search.idx"); open (SIDX2, "$data_dir/search2.idx"); my @sidx = <SIDX>; my @sidx2 = <SIDX2>; @skeyw = grep {$_ ne ""} @skeyw; @premiumkeyw = grep {$_ ne ""} @premiumkeyw; my $regexp = join('\|', @skeyw, @premiumkeyw); $regexp = qr/$regexp/; (@skeyw) = split(/ /,$fields{'keywords'}); # SPLITTING SIDX FILE (@premiumskeyw) = split(/ /,$fields{'keywords'}); ## SPLITTING SIDX2 F +ILE $nrkeywords = push(@skeyw); $premiumnrkeywords = push(@premiumskeyw);` [download]	[reply] [d/l]
Re^3: Separating multiple keyword search input by tkil (Monk) on May 13, 2004 at 07:10 UTC
First, apologies for taking so long to get back to you on this. (And it took even longer, as my browser decided to nose dive after I wrote up this response the first time. Sigh.) Looking at this code, though, I again have to say that it is almost a complete non-sequitor to me. I think I understand your overall goals, but your actual program construction seems to be very haphazard. I can't figure out your larger issues without grasping your entire system (which I'm not equipped to do!), but here are a few comments on your "programming in the small": `open (SIDX, "$data_dir/search.idx"); open (SIDX2, "$data_dir/search2.idx"); my @sidx = <SIDX>; my @sidx2 = <SIDX2>;` [download] Assuming that these are what you're actually searching through, you probably don't want to load them into memory up front like this. Build up your comparison / scoring function, then apply it to each line of input, keeping only those that match to a certain level. (This all depends on what portion of the index you expect to return; in a typical situation, I would assume that only a small part of the index is relevant to any given query, so I would want to have only those values in memory.) `(@skeyw) = split(/ /,$fields{'keywords'}); # SPLITTING SIDX FILE (@premiumskeyw) = split(/ /,$fields{'keywords'}); ## SPLITTING SIDX2 F +ILE` [download] At the very least, the comments are grossly inaccurate. Also, you're splitting the same value into two different arrays. Finally, you are using a pattern which can result in the null strings that you later have to filter out; a better split pattern can fix that. The most trivial change is that simple hash keys don't need to be in quotes. All together, we can just say: `my @keywords = split ' ', $fields{keywords};` [download] The use of `' '` has special meaning to `split`: break it up on one or more whitespace characters, and ignore leading whitespace as well. This should guarantee that you have no null keywords in the array, making your checks unnecessary. (Side note: don't skimp on variable names. Use an editor that does expansions if you have to, but I personally find `@keywords` more evocative and self-descriptive than `@skeyw`. Same comment applies to filehandle names, and pretty much every other name in your program...) `$nrkeywords = push(@skeyw); $premiumnrkeywords = push(@premiumskeyw);` [download] Assuming you just want to get the number of elements in those arrays (which, as pointed out above, are the same array, so you don't need to do it twice!), you can simply evaluate the array in scalar context. The slightly long way of saying that is: `my $n_keywords = scalar @keywords;` [download] I say this is the long way, because the `scalar` is unnecessary. You're asasigning something into a scalar, and that puts that thing into a scalar context. So, we could have written: `my $n_keywords = @keywords;` [download] Either way, you certainly don't need `push`. Finally, now that you know there is only one set of keywords, you can construct the regex out of them. One point to consider is whether thes keywords are themselves regexes, or if they are just plain text. If the latter, we want to make sure we neutralize any characters that are special to the regex engine. The built-in function `quotemeta` is just the ticket. Putting it all together, we have: `my $regex = join '\|', map { quotemeta $_ } @keywords; $regex = qr/$regex/;` [download] I addressed this in more detail in my original response, but now you can basically do: `my @std_hits = grep { m/$regex/ } <STD_IDX>; my @prem_hits = grep { m/$regex/ } <PREM_IDX>;` [download] Hopefully this has helped you further. I do worry that there are archetectural issues that I am unable to address, and untill you get the chance to rethink what you are doing (as opposed to just making random changes and hoping for the best), you're not going to have much luck. Regardless, I hope it works out for you.	[reply] [d/l] [select]
Re: Separating multiple keyword search input by davido (Cardinal) on Apr 30, 2004 at 03:00 UTC
The following code isn't intended to be a cut-n-paste solution, but rather a demonstration of a technique you can use to search for multiple keywords and do replacements based on those keywords. `my %replacements = qw/men women houses homes cats dogs venus mars carpet rug full empty/; my $string = "men cats houses venus carpet full"; for my $keyword ( keys %replacements ) { my $re = qr/\b$keyword\b/; $string =~ s/$re/$replacements{$keyword}/gi; } print "$string\n"; __OUTPUT__ women dogs homes mars rugs empty` [download] Hope this helps. There are other similar idioms, and other different idioms that accomplish the same thing, but this one seems to be more or less what you're asking for. Dave	[reply] [d/l]
Re: Separating multiple keyword search input by bart (Canon) on Apr 30, 2004 at 19:33 UTC
`$main_template =~ s/%%keywords%%/$fields{'keywords'}/g; $main_template =~ s/%%searcresults%%/$pitem/g; $main_template =~ s/%%keywords%%/$fields{'keywords'}/g; $main_template =~ s/%%premiumlistings%%/$premiumitem/g;` [download] Don't do several substitutions one after another, you could end up replacing what you replaced, again. For example: `$_ = 'taxial'; s/taxi/cab/g; s/cabal/gang/g; print;` [download] prints "gang". Better would be to replace both at once: `$_ = 'taxial'; my %subst = ( 'taxi' => 'cab', 'cabal' => 'gang' ); local $" = "\|"; my @keys = map quotemeta, sort { length $b <=> length $a } keys %sub +st; my $regexp = qr(@keys); s/($regexp)/$subst{$1}/g; print;` [download] which prints "cabal". Once the first substitution went past "taxi", replacing it with "cab", we've moved too far to replace it again in the second substitution. Usually, this is exactly what we want. I like Regex::PreSuf (abbreviated from prefix/suffix) for this kind of solution. It combines several "words" into one regexp, as alternatives, and takes care of quotemeta as well. The code is shorter, and may likely become faster, too. That's the idea, though it depends on the words in the list. `$_ = 'taxial'; my %subst = ( 'taxi' => 'cab', 'cabal' => 'gang' ); use Regex::PreSuf; my $regexp = presuf(keys %subst); s/($regexp)/$subst{$1}/g; print;` [download] Applied to your code, it becomes: `my %subst = ( %fields, searcresults => $pitem, premiumlistings => $premiumitem ); use Regex::PreSuf; my $regexp = presuf(keys %subst); $main_template =~ s/%%($regexp)%%/$subst{$1}/go;` [download] BTW Using the /o modifier, the regexp is compiled only once, the first time a match is tried. Even though you can later change the values in the substitution hash, adding more keys will not do any good.	[reply] [d/l] [select]


Perl Monk, Perl Meditation
	PerlMonks

Separating multiple keyword search input

Finding matches from a file.