The only modules are DBI, CGI, and Everything. Most of the actual code
is included below.
This is a pretty specialized situation. For example, we can't allow any
single query to run very long since MySQL was designed assuming some
aspects of the threading model (light-weight processes) that aren't
present on FreeBSD. So a single long-running query can nearly lock up
access to the database for everyone.
Another MySQL quirk is that even when you use DBI, once you submit a query
MySQL does the whole dang thing before it lets you get the first matching
record back. If it weren't for this quirk, I could work around the
previous problem (in many cases) by terminating the long-running query
after I'd fetched enough records (or taken too much time). But I can't,
so I have to ensure that each query will finish fairly quickly in all
cases.
And we tried using MySQL's "full-text search" features. I pretty much
hated the results (can't search for 3-letter words, for example). But
worse, it still had "worst case" situations where a single search could
end up locking up the whole site.
I remember my first run-ins with full-text search (in the '80s). It
sounds so nice but in practice I find that it rarely works very well. It
usually takes lots and lots of practice to learn how to do a decent
search that gives you back something you are looking for but not buried
amid 4000 things you don't want.
Well, I haven't noticed it get much better. Sure, there are some "words"
that are unique enough that searching on them works great. But way too
much of the time you don't get that lucky. I find that I almost always
prefer searching an index than trying my luck at "full-text search".
Except, of course, for Google! The miracle of Google is how
they sort the results. Trying to imitate Google's ingenious sorting
technique (what we know of it) seemed like too much work for me to pull
off in my spare time.
Now, we can use Google. But I've tried using Google against PerlMonks
(directly and through "thepen") and it just doesn't work that well.
And lots of times I don't want to search for "words". Even Google doesn't
do a very good job if you want to search for parts of "words". It does
pretty good at search for "phrases". But throw lots of punctuation in and
it and I often don't agree on what a "word" is. And Perl code certain
often has lots of punctuation in it.
So I realy liked the original PerlMonks simple and Super searches. They
were pretty much just substring searches that required "and" (match
all of the criteria listed). I found that I was successful with
them much more frequently than I am with full-text searches. But, they
are resource intensive.
So I've heard quite a few people's suggestions on ways to build a word
lists and lists of "stop" words that you aren't allowed to search for,
etc. to roll our own full-text search. Or some canned full-text search
to use. Well, I encourage anyone who wants to to pursue such. We've
even added XML tickers so external search engines can be built more
easily. But I doubt I will find them very useful. (BTW, my
quick glance at DBIx::FullTextSearch makes me think it wants to create
its own tables so you might want to try making an external search out
of it -- if it turns out nice, we can probably find a place to host it.)
Anyway, in trying to find out why the site was locking up, I made changes
to "mytop" (like "top" but for MySQL queries -- it shows the current queries
against database and how long they've been running) and did lots of watching
and fixing. And I've come to understand some of what makes a query slow
in MySQL.
I kind of like MySQL's query optimizer. It is quite simple (compared to
some of the ones I deal with) and so it is much easier to predict what it
is going to do. I spent years writing database code where we didn't use
SQL, we would seek to a point in an index and read forward or backward
from there. So I think of ways to get results efficiently at a low
level. But sometimes it is dang hard to write SQL that convinces
the optimizer to do what I thought up (so, somewhat paradoxically, a query
that I could perform very efficiently will be executed very inefficiently
by the SQL server).
So, the basic lesson in this case is that you want to make a query that
can find all of the records that it needs by searching a fairly small
range of one index and then set a reasonable LIMIT. Then you have to
tell SQL to order the records based on that index (that makes it much more
likely that the optimizer will actually choose to use the index that you
gave to it on a silver platter).
Of course, many queries can't be fullfilled that way so I end up findng
the matches by doing a sequence of such queries.
If you don't specify a small range, then you could spend too much time
searching a large number of records (and the optimizer also "understands"
this and so becomes more likely to ignore your preferred index) which
would lock up MySQL on FreeBSD (a joke user someone made for me when I
started blaming all site problems on this combination). And you have
to set a reasonable LIMIT or else you could spend too much time sending
the matching information across (with the same result). And since you
specified the sort order, you can also efficiently continue where you
left off (without having to use something like MySQL's "LIMIT 50,150" which
probably requires the first part of the search basically be repeated).
So that is what the newest Super Search does. And, just in case I don't
know the MySQL optimizer as well as I think I do, it asks the optimizer
to explain its plan before letting MySQL try to perform the query. If it
doesn't decide to use an index that requires only a fairly small number
of records to be read, then I won't run the query.
I've made some fairly minor changes to the code and removed some code
that is just for future features in hopes of it being slightly easier
to understand. Some changes are things I planned to do to clean up
the code but I just did them quickly here and haven't tested. So if
you see a syntax error, it is probably just a typo. /:
Also, writing code for Everything makes it hard to write utility
subroutines so this isn't really "factored" like I would normally
write code. Okay, enough excuses. ;)
Update: I forgot one excuse. (: The CGI parameters names were chosen to be very short (but still somewhat mnemonic) because I plan to add a feature where you can cut'n'paste a URL that performs the search that you have crafted for repeating it later or referring to it in a node, etc.
-
tye (but my friends call me "Tye")
[%
my $html= '';
my @errors;
my @types= q(
Wi perlquestion SoPW Seekers of Perl Wisdom
D monkdiscuss PMD PM Discussions
Ob obfuscated Obfu Obfuscation
CU CUFP CUFP Cool Uses For Perl
CC sourcecode Code Code Catacombs
CQ categorized_question CatQ Categorized Questions
CA categorized_answer CatA Categorized Answers
Hlp sitefaqlet Help Monk Help
Tu perltutorial Tut Tutorial
U user User
Po poem Poem
Cr perlcraft Craft
Sn snippet Snippet
N perlnews News
Q quest Quest
Pol poll Poll
M perlmeditation Med Meditations
SP scratchpad SPad Scratch Pad
MR modulereview ModRev Module Review
BR bookreview BkRev Book Review
pPd perlman perlman Perl Manpage
pFn perlfunc perlfunc Perl Function
pFq perlfaq_nodetype perlfaq Perl FAQ
) =~ /(\S.*\S)/g;
my( %abbr, %desc, %typeId, %link );
for( @types ) {
my( $abbr, $type, $link, $desc )= split " ", $_, 4;
$type =~ tr/_/ /;
my $id= getId( getType($type) );
$typeId{$type}= $id;
$abbr{$type}= $abbr;
$desc{$type}= $desc || $link;
$link{ $id }= $link;
$_= $type;
}
my %typeTable= qw(
snippet snippet
bookreview review
modulereview review
sourcecode sourcecode
poll polls
);
my %fieldOfTable= (
snippet => [qw( snippetdesc snippetcode )],
review => [qw( itemdescription usercomment doctext )],
sourcecode => [qw( codedescription doctext )],
polls => ['choices'],
#user => [qw( scratchpad )], # Needs to change
);
my @sects;
my $sects= do {
my $negSects= ( $q->param("xs") )[-1] ? 1 : 0;
my %checked;
@checked{keys %abbr}= map {
( ()= $q->param($abbr{$_}) ) ? 1 : 0;
} keys %abbr;
@sects= grep $negSects != $checked{$_}, keys %abbr;
@sects= @types if ! @sects;
join ", ", map $typeId{$_}, @sects;
};
my @criteria;
my @users= grep length, $q->param("a");
if( 1 == @users && $users[0] =~ m#^(\s*\[[^\]]+\])+\s*$#g ) {
@users= $users[0] =~ m#\[([^\]]+)\]#g;
}
for my $user ( @users ) {
my $type = "user";
my $reason = "does not exist";
my $U;
if( $user !~ m#^id://(\d+)$# ) {
$U = getNode( $user, "user" );
} else {
( $type, $user ) = ( "node ID", $1 );
$U = getNodeById( $user );
if( $U && "user" ne $U->{type}{title} ) {
undef $U;
$reason = "is not a user";
}
}
if( $U ) {
$user= getId($U);
} else {
$user= 0;
push @errors, qq[\u$type "] . $query->escapeHTML($user)
. qq[" $reason.<br />];
}
}
@users= grep $_, @users;
my $negAuthor= ( $q->param("xa") )[-1] ? 1 : 0;
$negAuthor= $negAuthor ? " NOT" : "";
if( @users ) {
push @criteria, "n.author_user$negAuthor IN ( "
. join( ", ", @users ) . " )";
}
my $replies= ( $q->param("re") )[-1];
$q->param( "re", $replies );
my $xRoots= ()= $q->param("xr");
my $note= getId( getType("note") );
push @criteria, do {
if( "N" eq $replies ) { # No replies:
push @errors,
"No root nodes and no replies means no search.<br />"
if $xRoots;
$xRoots
? "n.node_id = 0" # Find nothing!
: "n.type_nodetype IN ( $sects )"; # Just sel. roots
} elsif( "A" eq $replies # All replies (same as
|| @sects == @types ) { # re.s from all sect.s):
$xRoots
? "n.type_nodetype = $note" #Just all re.s
: "n.type_nodetype IN ( $note, $sects )";#^ + sel. roots
} else { # Replies from sel. sects:
$q->param( "re", undef );
my $c= "( n.type_nodetype = $note"
. " AND root.type_nodetype IN ( $sects ) )";
$xRoots
? $c # Sel. re.s
: "( n.type_nodetype IN ( $sects ) OR $c )"; # ^ + roots
}
};
# ( Head Body ) + ( Includes Excludes ) + ( Terms Seperator )
my $getTerms= sub {
my( $textParam, $sepParam )= @_;
my $str= $q->param( $textParam );
my $sep= $q->param( $sepParam );
$sep =~ s/^\s*//;
$sep =~ s/\s*$//;
$sep= " " if ! length $sep;
$q->param( $sepParam, $sep );
my @terms= grep length, split /\Q$sep/, $str;
$q->param( $textParam, join $sep, @terms );
return @terms;
};
my @headHas= $getTerms->( "HIT", "HIS" );
my @headLacks= $getTerms->( "HET", "HES" );
my @bodyHas= $getTerms->( "BIT", "BIS" );
my @bodyLacks= $getTerms->( "BET", "BES" );
my( @tables, @fields );
push @tables, 'note',
"left join node as root on root.node_id=root_node";
if( @bodyHas || @bodyLacks ) {
my( %tables, %fields );
push @sects, 'note'
unless 'N' eq $replies;
for my $type ( @sects ) {
if( $typeTable{$type} ) {
++$tables{ $typeTable{$type} };
++$fields{$_}
for @{ $fieldOfTable{ $typeTable{$type} } };
} else {
++$tables{document};
++$fields{doctext};
}
}
push @tables, keys %tables;
push @fields, keys %fields;
}
my $tables= "node as n";
for my $table ( @tables ) {
if( $table =~ / / ) {
$tables .= "\n$table";
} else {
$tables .= "\nleft join $table on ${table}_id=n.node_id";
}
}
if( @headHas ) {
push @criteria, map {
my $quoted= $_;
$quoted =~ s#\\#\\\\#g; # MySQL bug
$quoted =~ s#(['%_\\\[\]])#\\$1#g;
"n.title LIKE '%$quoted%'";
} @headHas;
}
if( @headLacks ) {
push @criteria, map {
my $quoted= $_;
$quoted =~ s#\\#\\\\#g; # MySQL bug
$quoted =~ s#(['%_\\\[\]])#\\$1#g;
"n.title NOT LIKE '%$quoted%'";
} @headLacks;
}
if( @bodyHas ) {
push @criteria, map {
my $quoted= $_;
$quoted =~ s#\\#\\\\#g; # MySQL bug
$quoted =~ s#(['%_\\\[\]])#\\$1#g;
"( " . join( " OR ", map {
"$_ LIKE '%$quoted%'";
} @fields ) . " )";
} @bodyHas;
}
if( @bodyLacks ) {
push @criteria, map {
my $quoted= $_;
$quoted =~ s#\\#\\\\#g; # MySQL bug
$quoted =~ s#(['%_\\\[\]])#\\$1#g;
map {
"$_ NOT LIKE '%$quoted%'";
} @fields;
} @bodyLacks;
}
my $oldFirst= ! ( $q->param("nf") )[-1];
my $n0= $q->param("n0");
my $doSearch= $n0 && ! @errors;
my $lastNode= $DB->sqlSelect( "max(node_id)", "node" );
$n0 ||= $oldFirst ? 1 : $DB->sqlSelect( "max(node_id)", "node" );
push @criteria, "n.node_id BETWEEN !TBD!";
my $limit= 50;
if( $doSearch ) {
require Time::HiRes;
my @matches;
my $start= $n0;
my $startTime= Time::HiRes::time();
while( 1 ) {
my( $min, $max );
if( $oldFirst ) {
( $min, $max )= ( $n0, $n0+10000 );
$max= 1000 * int( $max/1000 + 0.5 );
$max= $lastNode if $lastNode < $max;
} else {
( $min, $max )= ( $n0-10000, $n0 );
$min= 1000 * int( $min/1000 + 0.5 );
$min= 1 if $min < 1;
}
$criteria[-1]= "n.node_id BETWEEN $min AND $max";
my $explainTime= Time::HiRes::time();
my $query= qq[
SELECT n.node_id, n.title, n.type_nodetype,
n.author_user, n.createtime, root.type_nodetype
FROM $tables
WHERE ] . join( " AND ", @criteria ) . qq[
ORDER BY n.node_id
LIMIT ] . ( $limit - @matches );
my $explain= $DB->getDatabaseHandle()->prepare(
"EXPLAIN $query" );
$explain->execute();
my $rec= $explain->fetchrow_hashref();
$explain->finish();
my $key_used= $rec->{key};
my $key_rows= $rec->{rows};
my $comment= $rec->{Comment};
$explainTime= Time::HiRes::time() - $explainTime;
if( 3 < $explainTime ) {
push @errors, ( $start==$n0 ? "Q" : "Remainder of q" )
. qq[uery was not run; Server is too busy ]
. sprintf(
qq[("explain" took %.2f seconds)<br />],
$explainTime
);
last;
}
unless( "PRIMARY" eq $key_used
or "" ne $key_used && $key_rows < 10000
) {
push @errors, ( $start==$n0 ? "Q" : "Remainder of q" )
. "uery would not run quickly"
. ( $comment ? " ($comment)" : "" )
. ".<br />\n";
last;
}
my $cursor= $DB->sqlSelectMany(
"n.node_id as node_id, n.title as title,
n.type_nodetype as type_nodetype,
n.author_user as author_user, n.createtime as createt
+ime,
root.type_nodetype as root_nodetype",
$tables,
join( " AND ", @criteria ),
"ORDER BY n.node_id LIMIT " . ( $limit - @matches ),
);
my $rec;
while( $rec= $cursor->fetchrow_hashref() ) {
push @matches, $rec;
}
$cursor->finish();
if( @matches < $limit ) {
$n0= 1 + $max;
} else {
$n0= 1 + $matches[-1]{node_id};
last;
}
last if $lastNode < $n0;
my $runTime= Time::HiRes::time() - $startTime;
if( 10 < $runTime ) {
push @errors, ( $start==$n0 ? "Q" : "Remainder of q" )
. qq[uery was not run ]
. sprintf(
qq[(used %.2f seconds so far)<br />],
$runTime
);
last;
}
}
my $startDate= ( split " ", $DB->sqlSelect(
"createtime","node","node_id=$start") )[0];
my $endDate= ( split " ", $DB->sqlSelect(
"createtime","node","node_id=".($n0-1)) )[0];
my $matches= @matches;
$html .= qq[<p><hr />
<b>Found $matches node] . ( 1==$matches ? "" : "s" )
. qq[</b>between IDs $start ($startDate) and ] . ($n0-1)
. qq[($endDate)];
if( @bodyHas || @bodyLacks || @headHas || @headLacks ) {
$html .= qq[<br />where ] . join qq[<br />and ],
map {
my( $desc, @terms )= @$_;
if( ! @terms ) {
();
} else {
$desc . join( ", ", map {
'"<tt>' . $q->escapeHTML($_) . '</tt>"'
} @terms )
}
} ["any text contains all of ",@bodyHas],
["no text contains any of ",@bodyLacks],
["title contains all of ",@headHas],
["title doesn't contain any of ",@headLacks],
}
if( @users ) {
$html .= qq[<br />written by ]
. ( $negAuthor ? "anyone but " : "any of " )
. join ", ", map linkNode($_), @users;
}
$html .= qq[</p>\n];
my $linkType= sub {
my( $typeId )= @_;
return linkNode( $typeId, $link{$typeId} );
};
$html .= qq[<p><table width="100%">];
for my $rec ( @matches ) {
$html .= $q->Tr(
$q->td( ( split " ", $rec->{createtime} )[0] ),
$q->td( linkNode($rec->{author_user}) ),
$q->td( linkNode($rec->{node_id},$rec->{title}) ),
$q->td(
$note == $rec->{type_nodetype}
? "Re:" . $linkType->( $rec->{root_nodetype} )
: $linkType->( $rec->{type_nodetype} )
),
);
}
$html .= qq[</table></p>\n];
}
$q->param( "n0", $n0 );
if( $doSearch ) {
if( $oldFirst && $n0 < $lastNode
|| ! $oldFirst && 1 < $n0 ) {
my( $min, $max )=
$oldFirst ? ( $n0, $lastNode ) : ( 1, $n0 );
$html .= qq[<p>
Press a "Search" button (below) <b>to continue</b>
(IDs $min thru $max).
</p>\n];
}
$html .= "<hr />";
}
$html .= '<p>' . linkNode( $NODE, "Reset search form" ) . "</p>\n"
+;
$html .= $/ . htmlcode('openform') . $/;
$html .= qq[<p>
Match <b>text</b> containing ] . $q->textfield( "BIT", "", 60
+)
. qq[<br />(seperate strings with ]
. $q->textfield( "BIS", " ", 2 ) . qq[ -- default is spaces)
<br />] . $q->radio_group( "BH", [ "0", "1" ], "1", 0,
{ 0=>"Don't match -or-", 1=>"Also match" },
) . qq[ <b>titles</b> against above.</p>];
$html .= $/ . $q->submit("","Search") . qq[
Please be patient after submitting your search.\n];
$html .= qq[<p>
Match <b>titles</b> containing ]
. $q->textfield( "HIT", "", 60 )
. qq[<br />(separate strings with ]
. $q->textfield( "HIS", " ", 2 )
. qq[ -- default is spaces)</p>];
$html .= qq[<p>
] . $q->radio_group(
"xa", [ "0", "1" ], "0", 0, { 0=>"Match -or-", 1=>"Exclude
+" },
) . qq[ <b>authors</b> ] . $q->textfield( "a", "", 20 )
. qq[<br />
(use "[one] [two]" to list multiple authors)
<br />(Searching by author doesn't work for Categorized
Questions and Answers yet.)</p>];
$html .= qq[<p>
Search ] . $q->radio_group(
-name=>"nf", -values=>[ "1", "0" ], -default=>"0",
-labels=>{ 1=>"Newest first -or-", 0=>"Oldest first" },
-disabled=>"disabled",
) . qq[,<br />starting at node
] . $q->textfield( "n0", "0", 12 ) . qq[ (]
. ( split " ", $DB->sqlSelect(
"createtime","node","node_id=$n0") )[0]
. qq[).</p>];
$html .= qq[<!-- <p> Show {10|20|50} matches per page.</p> -->];
$html .= qq[<p>
Search ] . $q->radio_group(
"xs", [0,1], 0, 0, {0=>"only -or-",1=>"all but"},
) . qq[<br />the following <b>sections</b>:];
$html .= qq[<ul>] . $q->table(
map(
"\n "
. $q->Tr(
map "\n " . $q->td(
$q->checkbox(
-name => $abbr{$types[$_]},
-value => "",
-label => $desc{$types[$_]},
"scratchpad" eq $types[$_]
? ( -disabled => "disabled" )
: (),
)
), @$_
), map( [ $_, $_+8, $_+16 ], 0..6 ), [7,15]
), $/
) . qq[</ul>\n];
$html .= qq[<p>
<i>Skip</i> <b>text</b> containing ]
. $q->textfield( "BET", "", 60 )
. qq[<br />(seperate strings with ]
. $q->textfield( "BES", " ", 2 )
. qq[ -- default is spaces)<br />
(Does not exclude based on titles)</p>];
$html .= qq[<p>
<i>Skip</i> <b>titles</b> containing ]
. $q->textfield( "HET", "", 60 )
. qq[<br />(seperate strings with ]
. $q->textfield( "HES", " ", 2 )
. qq[ -- default is spaces)</p>];
$html .= qq[</p><p>\n] . $q->radio_group(
"xr", ["0","1"], "0", 1, {
0 => "Include <b>root</b> nodes from selected sections",
1 => "Don't include <b>root</b> nodes",
},
);
$html .= qq[</p><p>\n] . $q->radio_group(
"re", [qw( A S N )], "S", 1, {
A => "Include <b>replies</b> from <i>any</i> section",
S => "Include <b>replies</b> from <i>selected</i> sections
+",
N => "<i>Don't</i> include <b>replies</b>",
},
);
$html .= qq[\n<p>] . $q->submit("","Search") . qq[
Please be patient after submitting your search.</p>\n];
$html .= qq[</form>\n];
$html .= qq[\n<!-- CGI::VERSION=$CGI::VERSION -->\n];
return "<b>@errors</b>$html";
%]