Seems almost like overkill when it can be done in just a few lines. The function below searches three million records very quickly.
# Usage: record_search($search_string, $filename, 11);
# Returns the record number if the record is found, undef otherwise.
# Searched file must be sorted.
sub record_search {
my ($key, $filename, $recordsize) = @_;
my $fh = IO::File->new($filename, 'r')
or croak("Could not open `$filename': $!");
my $hi = ($fh->stat)[7] / $recordsize;
my $lo = 0;
while ($lo < $hi) {
my $mid = int(($lo + $hi) / 2);
$fh->seek($mid * $recordsize, 0);
my $target = $fh->getline();
chomp($target);
if ($target < $key) {
$lo = $mid + 1;
} elsif ($target == $key) {
return $mid;
} else {
$hi = $mid;
}
}
}
Update Forgot to mention: even though this function opens the file every time, it can successfully search 3 million records about 2000 times per second on a 3Ghz Pentium 4. I can't see much justification for anything more complicated. Threads? Bit strings? Crazy! :) | [reply] [d/l] |
Sometimes being able to do it in "just a few lines" isn't the point (take Data::Page for example). It's having to always rewrite, correctly, those lines every time you want to solve this particular problem. In this case, the module makes for a more flexible solution, and it's a "few lines" (about 12 actually) that I don't have to rewrite. The downside is that it is about 20% slower, but that's a non-issue when response to any single query is imperceptible.
I can't see much justification for anything more complicated.
Optionally searching for strings instead of numbers?
Not having to have a fixed record size or not having to know what the recordsize is? (note: there is a set recordsize function in File::SortedSeek which claims to improve performance, but I could not see any difference in speed from using it)
Anyway, here's your function using F:SS instead:
use File::SortedSeek;
sub record_search {
my ($key, $filename, $recordsize) = @_;
my $fh = IO::File->new($filename, 'r')
or croak("Could not open $filename: $!");
# I don't see any difference in speed with the next line
# File::SortedSeek::set_line_length($recordsize);
numeric( $fh, $key);
return File::SortedSeek::was_exact();
}
Update: Looking at the answers below, I learned about Search::Dict, which would work just about as well, and has been in core perl since at least 5.6.1, but still supports my point about saving just a few lines. | [reply] [d/l] |