In order traversal of BTREE keys where not all keys have duplicate values (using DB

stajich has asked for the wisdom of the Perl Monks concerning the following question:

I'm binning data into segments and then want to retrive a subset of the data from a run of sequential bins. Data is going into the system fine, but not every bin has multiple values so when I iterate through the results after setting the cursor using get_dup, only keys with multiple values are returned by the cursor.

Below is a code which illustrates the problem. Is this a DB_File/BDB 1.x limitation/feature that is better addressed with BerkeleyDB interface and the more robust cursors?

$DB_BTREE->{'flags'} = R_DUP;
$DB_BTREE->{'compare'} = \&_compare;
my %btree;
my $bhandle = tie %btree, 'DB_File', undef, O_RDWR|O_CREAT, 0640, $DB_
+HASH;

my $len = 26;
my @array =  ( 'a'..'z' );
foreach ( 1..$len ) {
    $btree{$_} = shift @array;
}
# add a second value to each so that each key has duplicate values
@array =  ( 'A'..'Z' );
foreach ( 1..$len ) {
    $btree{$_} = shift @array;
}

# test to see that each value is printed from 20 - end
my @v = $bhandle->get_dup(20);
print "v is @v 20\n";

while( $bhandle->seq($k,$v, R_NEXT) == 0 ) {
    my @v = $bhandle->get_dup($k);
    print "$k @v\n";
}

# now associate a single value with a key

$btree{22.5} = 'HHI';

# test to see that each value is printed from 20 - end
my @v = $bhandle->get_dup(20);
print "v is @v 20\n";

while( $bhandle->seq($k,$v, R_NEXT) == 0 ) {
    my @v = $bhandle->get_dup($k);
    print "$k @v\n";
}
# 22.5 does not show up

# add a second value for 22.5
$btree{22.5} = 'JKL';

# test to see that each value is printed from 20 - end
my @v = $bhandle->get_dup(20);
print "v is @v 20\n";

while( $bhandle->seq($k,$v, R_NEXT) == 0 ) {
    my @v = $bhandle->get_dup($k);
    print "$k @v\n";
}

# now 22.5 is in the list
[download]

The best workaround I have thought of will be to dump all the keys, find the bin that is closest to where I want to start O(log(n) (since list will be sorted), and walk through the list until reaching end boundary condition, calling get_dup on each key in the subset (which still works if only one value is stored for the key).

Comment on In order traversal of BTREE keys where not all keys have duplicate values (using DB_File) Download Code

Replies are listed 'Best First'.
(podmaster) Re: In order traversal of BTREE keys where not all keys have duplicate values (using DB_File) by PodMaster (Abbot) on May 19, 2002 at 23:52 UTC
What is the definition of your sub _compare? (incomplete code cannot possibly demonstrate the problem) ~~Use the find_dup method to see if there are any duplicates.~~ (change the format of your values to help you out here, or just forget the whole R_DUP thing, and do something with the keys like some_key_part_1, some_key_part_2 ...) Here is an example adapted from the DB_File documentation to show you a neat little trick use strict ; use DB_File ; use vars qw( $x %h ) ; # Enable duplicate records $DB_BTREE->{'flags'} = R_DUP ; $x = tie %h, "DB_File", undef, O_RDWR\|O_CREAT, 0640, $DB_BTREE or die "Cannot open $filename: $!\n"; $h{'Wall'} = 'only once'; $h{'Smith'} = 'more than once'; $h{'Smith'} = 'more than twice'; $h{'Smith'} = 'three times'; my $cnt = $x->get_dup("Wall") ; print "Wall occurred $cnt times\n" ; my %hash = $x->get_dup("Wall", 1) ; print "Larry is there\n" if $hash{'Larry'} ; print "There are $hash{'Brick'} Brick Walls\n" ; my @list = sort $x->get_dup("Wall") ; print "Wall => [@list]\n" ; @list = $x->get_dup("Smith") ; print "Smith => [@list]\n" ; @list = $x->get_dup("Dog") ; print "Dog => [@list]\n" ; [download] particularly interesting is this tidbit `$count = $x->get_dup($key) ; @list = $x->get_dup($key) ; %list = $x->get_dup($key, 1) ;` [download] Look ma', I'm on CPAN. ** The Third rule of perl club is a statement of fact: pod is sexy.	[reply] [d/l] [select]
Re: (podmaster) Re: In order traversal of BTREE keys where not all keys have duplicate values (using DB_File) by stajich (Chaplain) on May 20, 2002 at 15:19 UTC
Oops missing the compare function. `sub _compare { $_[0] <=> $_[1] }` Keys are numeric bins of the form 10000.000178 - this adapted from a mysql strategy where we are testing for bins within the boundaries but I am interested in an in-memory adaptation. I want to iterate through a subset of the keys that are bounded by some values $begin, $end not the whole set of keys as is demonstrated in all the examples. I was using get_dup to initialize the cursor and then iterate through all the available keys, however the cursor will only point to keys which have >1 value. === Ahem. I was making it harder than it should be. `$bhandle->seq($start,$v,R_CURSOR); while( $bhandle->seq($k,$v,R_NEXT) == 0 ) { last if( $k> $end); ... }` [download] Gets me what I want. Calling get_dup to initialize the cursor was not a smart idea and was the root of the problem.	[reply] [d/l] [select]


go ahead... be a heretic
	PerlMonks

In order traversal of BTREE keys where not all keys have duplicate values (using DB_File)