Re: Database searches with persistent results via CGI

Good afternoon, MrCromeDome. It seems you are having the same sort of problem as do a lot of search engines... needing to save cycles on many repeat queries. There are a lot of approaches to it, and it really is going to depend on the result set size, what hardware you have it on, etc. I'm going to assume that you have mySql as a database engine, as these tips will hold true for many many DBMs out there.

First off, consider this, what is the upper limit size on your result set. Are you looking at 100s, 1000s, more? Because at a certain point, these caching mechanisms are not really going to help you. You'll have to do some performance tweaking to see what size category the bulk your results end up.
Secondly, consider how often your data changes? This is essential in determining when your cache should drop dead, and the next search should store new data. If you have rapidly changing data (a good amount entered new each day), consider a cached time of perhaps 4 hours, or do it on off cycles, such as overnight.

In mySQL, whenever you request a large result set that has an "ORDER BY" statement in it, mySQL has to at least analyze the entire result set, then apply the limiter to it, to get the correct results. This can be quite tedious on the server, as you often pass over tons of results that you never pass back to the app. One way of caching is to get and hold all of the results that mySQL will come back to you with, and store them someplace fast to look them up in, such as another table.

A proposed table would take the form:

result_id = primary key int not null
dropdead = datetime, a future time in which to kill the result
searchstring = char(255) of search results.
resultcache = blob / text (or largeblob, etc)

Basically, you'd store the "SMITH JOHN" in the search string, and then form your result set in the resultcache, in some way that

Storage caching mechanisms can take two forms.

Binary column type, Storable, and freeze / thaw This method compresses the best, uncompresses fast, but is binary, and may be harder to debug.
A text data type (see the blob link above), Data::Dumper, and Data::Dumper->Dump / eval to store and then revive the data. This approach will store fewer results, is not as fast other approaches, but is easier to debug.

If you take a result set, and store it as an array, you can thaw (or eval), the entire result set and jump to a certain spot in the set. To look at this in psuedoperl:


my @list_of_cached_results;

if($indatabase){
  foreach my $result in ($search_result)
  {
    push @list_of_cached_results, result;
  }
  
  my $binary_cache = nfreeze(\@list_of_cached_results);
  #store lists of binary results
}
else
{
  @list_of_cached_results = thaw($indatabase);
}
renderPage(@list_of_cached_results, $offset);
[download]

Basically, if you use the perl persistance modules, and a database (such as the one you're already reading from), you can pick up a good amount of speed from picking up a pre-packaged set of results (a match to your search string), and then display it to the page, saving you the hell of finding all the results again. As long as you can manage results and not have them go "stale", then you'll be all set.

Hope this made some sense to you. If I don't seem clear, I'd be happy to clarify anything in here.

--jb

Comment on Re: Database searches with persistent results via CGI Download Code

Replies are listed 'Best First'.

Re: Re: Database searches with persistent results via CGI
by AidanLee (Chaplain) on May 10, 2002 at 13:04 UTC

Something to consider if drive space is an issue, and the entries returned by the search form all have unique ids is to serialize (by either of the methods JayBonci suggests) just the ids, rather than the entire data structure. I couldn't tell you up front, but it may either be

Slower if it takes longer to deserlialize the ids and then call another select statement to get the actual records
or faster, since ultimately you'd be transferring less data. First you'd only be transferring just the ids to your script, and selecting the range of ids for the current page of results. then you only have to transfer the full data for those specific records instead of for all results in the search.

The other thing I'd suggest is keying the temporary table off an md5 hash or some such of the search parameters, rather than a session (for all the reasons perrin has mentioned). This way you'll still have unique ids but you won't have issues with multiple browsers (or even multiple computers...).

[reply]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks