comment on

I'm binning data into segments and then want to retrive a subset of the data from a run of sequential bins. Data is going into the system fine, but not every bin has multiple values so when I iterate through the results after setting the cursor using get_dup, only keys with multiple values are returned by the cursor.

Below is a code which illustrates the problem. Is this a DB_File/BDB 1.x limitation/feature that is better addressed with BerkeleyDB interface and the more robust cursors?

$DB_BTREE->{'flags'} = R_DUP;
$DB_BTREE->{'compare'} = \&_compare;
my %btree;
my $bhandle = tie %btree, 'DB_File', undef, O_RDWR|O_CREAT, 0640, $DB_
+HASH;

my $len = 26;
my @array =  ( 'a'..'z' );
foreach ( 1..$len ) {
    $btree{$_} = shift @array;
}
# add a second value to each so that each key has duplicate values
@array =  ( 'A'..'Z' );
foreach ( 1..$len ) {
    $btree{$_} = shift @array;
}

# test to see that each value is printed from 20 - end
my @v = $bhandle->get_dup(20);
print "v is @v 20\n";

while( $bhandle->seq($k,$v, R_NEXT) == 0 ) {
    my @v = $bhandle->get_dup($k);
    print "$k @v\n";
}

# now associate a single value with a key

$btree{22.5} = 'HHI';

# test to see that each value is printed from 20 - end
my @v = $bhandle->get_dup(20);
print "v is @v 20\n";

while( $bhandle->seq($k,$v, R_NEXT) == 0 ) {
    my @v = $bhandle->get_dup($k);
    print "$k @v\n";
}
# 22.5 does not show up

# add a second value for 22.5
$btree{22.5} = 'JKL';

# test to see that each value is printed from 20 - end
my @v = $bhandle->get_dup(20);
print "v is @v 20\n";

while( $bhandle->seq($k,$v, R_NEXT) == 0 ) {
    my @v = $bhandle->get_dup($k);
    print "$k @v\n";
}

# now 22.5 is in the list
[download]

The best workaround I have thought of will be to dump all the keys, find the bin that is closest to where I want to start O(log(n) (since list will be sorted), and walk through the list until reaching end boundary condition, calling get_dup on each key in the subset (which still works if only one value is stored for the key).

In reply to In order traversal of BTREE keys where not all keys have duplicate values (using DB_File) by stajich

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


No such thing as a small change
	PerlMonks