DigitalKitty has asked for the wisdom of the Perl Monks concerning the following question:
Hi all.
Two of the most common search algorithms are the:
linear search and the binary search
Each has it's advantages as well as it's 'faults'. In a linear search, each element of the data structure is examined. Needless to say, in a small array ( for example ) this isn't a problem since there are only a relatively small number of elements to analyze. If the array is large, a binary search would be most effective. In a binary search, the array needs to be sorted before a search is initiated. Once that has been accomplished, there exists a tremendous gain in performance ( please see below ). By establishing a 'mid-point' in the array, we are able to eliminate 1/2 of the array ( the lower or upper portion ) when comparing our key ( the value we want to find ) to it's position in the array and it's relation to the mid-point. With each successive comparison, we divide the array in 1/2 thereby establishing a new mid-point. Then it's a simple matter of establishing whether or not our key is higher or lower than it.
An array with 2_048 elements could be successfully searched
using ( at most ) only 11 comparisons ( 2**11 == 2048 ).
When compared to the time it would probably take to search the same array using a linear search, the benefits should be readily apparent. The formula for determining the maximum number of comparisons needed is the first power of 2 greater than the number of elements in the array.
#!/usr/bin/perl -w
use strict;
my @array;
my $low = 0;
my $mid;
my $found_key = 0;
my $num;
my $key;
my $index;
@array = ( 5, 10, 30, 4, -3, 18, 101, 2001, 46, 23 );
@array = sort { $a <=> $b } @array;
my $high = $#array;
print "Please enter key to search for: ";
chomp( $key = <STDIN> );
while( ( $low <= $high ) && !$found_key ) {
$mid = ( $low + $high ) / 2;
if( $key == $array[$mid] ) {
$found_key = 1;
$index = int( $mid );
}
elsif( $key < $array[$mid] ) {
$high = $mid - 1;
}
else {
$low = $mid + 1;
}
}
if( $found_key ) {
print "$key is at position: $index\n";
}
else {
print "Sorry. I could not find: $key";
}
This is my first attempt at implementing a binary search in perl. If you could help me improve the code, I would be most appreciative.
Thanks,
-Katie.
Re: Binary searching with Perl.
by gmax (Abbot) on May 24, 2003 at 09:14 UTC
|
$mid = ( $low + $high ) / 2;
should become
$mid = int(( $low + $high ) / 2);
See also Binary search algorithm for more examples and discussion about this matter
update Fixed my own mistake. The error was there, but in the wrong place. The original algorithm does not catch the highest index item.
update (2) Jon Bentley in Programming Pearls advised that binary search can be tricky, and even experienced programmers can fail to get it properly. He is right. I have corrected this script three times, every time in the wrong way. Finally, when I monitored the three variables, I got the (hopefully) right path. :)
_ _ _ _
(_|| | |(_|><
_|
| [reply] [d/l] [select] |
|
> even experienced programmers can fail to get it properly.
There's always the "CPAN module" option. ;-)
Search::Binary
Michele.
| [reply] |
Re: Binary searching with Perl.
by Dominus (Parson) on May 25, 2003 at 04:47 UTC
|
As more food for thought, I'll present an application of binary search
where it doesn't make sense to 'use a hash'.
Suppose you're looking up records in a large sorted file.
The following search subroutine will search the file efficiently,
using a binary search. It gets two arguments: $fh is a filehandle
open to the file you want to search, and $cmp is
a comparison function. search will call $cmp
repeatedly with various records from the file;
$cmp should return zero if its argument is the desired record,
negative if its argument is earlier than the desired record,
and positive if its argument is later than the desired record.
search then returns the first record in the file that is
not earlier than the desired record, and leaves the filehandle
positioned at that record. If there is no such record, it
returns undef.
sub search {
my ($fh, $cmp) = @_;
my ($lo, $hi) = (0, -s $fh);
while (1) {
my $mid = int(($lo + $hi)/2);
if ($mid) {
seek $fh, $mid-1, SEEK_SET;
my $junk = <$fh>;
} else { seek $fh , $mid, SEEK_SET; }
my $start = tell $fh;
my $rec = <$fh>;
return unless defined $rec;
chomp($rec);
if ($hi == $lo) {
seek $fh, $start, SEEK_SET;
return $rec
};
local $_ = $rec;
if ($cmp->($rec) < 0) { $lo = $mid+1 }
else { $hi = $mid }
}
}
For example, suppose you want to search the dictionary file
for words that begin with foo. You might:
open DICT, "<", "/usr/dict/Web2" or die...;
if (search(\*DICT, sub { lc $_ cmp "foo" })) {
# DICT is now positioned at the first word beginning with 'foo'
while (<DICT>) {
last unless /^foo/i;
print;
}
}
The search call moves the filehandle quickly to the
first record that begins with foo, and then the
while loop prints out the words, quitting after the last one.
Using search is quicker than scanning the file from the beginning
looking for the first foo word, unless the file
is very small.
The behavior is similar to that of the standard Search::Dict
module, but the code is much simpler, and I think my interface is more flexible.
I wrote this as an example for my new class
Strategies for Lightweight Databases.
Share and enjoy!
--
Mark Dominus
Perl Paraphernalia
| [reply] [d/l] [select] |
|
My inclination would still be to use DB_File or a relative to access a BTREE. One of the methods in the API is exactly equivalent to your search interface, but the implementation should be more performance-efficient. (And using their API takes less code.)
| [reply] |
|
| [reply] |
Re: Binary searching with Perl.
by TVSET (Chaplain) on May 24, 2003 at 12:49 UTC
|
| [reply] |
Re: Binary searching with Perl.
by Anonymous Monk on May 24, 2003 at 11:23 UTC
|
Noting that this is a learning exercise and "use a hash" advice is unhelpful.
A binary search only becomes more efficient than a linear search, if the data is already sorted, or if you can sort once and amortise the cost of that sort over many, many searches.
If the data being search is static within the context of the program, then you should ensure that the program receives that data pre-sorted. If the data is dynamically aquired (gets added to) during the life of the program, then you should maintain the data in it sorted state.
Maintaining dynamic datasets in an order state means using either a binary search and insert on a linked list or maintaining the data as a binary tree
| [reply] |
|
| [reply] |
|
| [reply] |
|
| [reply] |
A bit OT remark (was Re: Binary searching with Perl.)
by zby (Vicar) on May 24, 2003 at 09:30 UTC
|
If you are using the array just as a set of values, i.e. you need three operations: adding a value to the set, deleting a value and checking if the value is in the set, you can use a hash instead of a sorted array. En equivalent of the binary search algorithm is inside the implementation of hashes - so you can use it for free, with no effort. | [reply] |
|
| [reply] |
Re: Binary searching with Perl.
by aquarium (Curate) on May 24, 2003 at 14:29 UTC
|
your considerations are very hypothetical (linear vs binary) + there are modified linear and binary (+other) searches, depending on your need. be careful though of making such grand statements as saying binary is so much faster for large datasets. yes, binary is generally faster. one very serious consideration to take into account (considering the high execution speed of programs on most recent pcs) is data access times from media. this involves many variables including block factors and such. also once you get into "huge" datasets than it's a matter of course to use a decent database engine, which has indexes and optimizers etc. anyway, good luck with your binary search algorithm. btw, it's easier (and faster) to traverse a binary linked list. | [reply] |
Re: Binary searching with Perl.
by Anonymous Monk on May 24, 2003 at 15:40 UTC
|
Mind you, you could avoid eleven comparisons if you use the hash, Katie.
Make the data you want to search the key of a hash, and find things in constant time. Of course, it only works if you're looking for exact matches, not for aproximate matches, misspellings, or values which fall within a range.
| [reply] |
|
|