Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

In order traversal of BTREE keys where not all keys have duplicate values (using DB_File)

by stajich (Chaplain)
on May 19, 2002 at 18:37 UTC ( #167686=perlquestion: print w/replies, xml ) Need Help??

stajich has asked for the wisdom of the Perl Monks concerning the following question:

I'm binning data into segments and then want to retrive a subset of the data from a run of sequential bins. Data is going into the system fine, but not every bin has multiple values so when I iterate through the results after setting the cursor using get_dup, only keys with multiple values are returned by the cursor.

Below is a code which illustrates the problem. Is this a DB_File/BDB 1.x limitation/feature that is better addressed with BerkeleyDB interface and the more robust cursors?

$DB_BTREE->{'flags'} = R_DUP; $DB_BTREE->{'compare'} = \&_compare; my %btree; my $bhandle = tie %btree, 'DB_File', undef, O_RDWR|O_CREAT, 0640, $DB_ +HASH; my $len = 26; my @array = ( 'a'..'z' ); foreach ( 1..$len ) { $btree{$_} = shift @array; } # add a second value to each so that each key has duplicate values @array = ( 'A'..'Z' ); foreach ( 1..$len ) { $btree{$_} = shift @array; } # test to see that each value is printed from 20 - end my @v = $bhandle->get_dup(20); print "v is @v 20\n"; while( $bhandle->seq($k,$v, R_NEXT) == 0 ) { my @v = $bhandle->get_dup($k); print "$k @v\n"; } # now associate a single value with a key $btree{22.5} = 'HHI'; # test to see that each value is printed from 20 - end my @v = $bhandle->get_dup(20); print "v is @v 20\n"; while( $bhandle->seq($k,$v, R_NEXT) == 0 ) { my @v = $bhandle->get_dup($k); print "$k @v\n"; } # 22.5 does not show up # add a second value for 22.5 $btree{22.5} = 'JKL'; # test to see that each value is printed from 20 - end my @v = $bhandle->get_dup(20); print "v is @v 20\n"; while( $bhandle->seq($k,$v, R_NEXT) == 0 ) { my @v = $bhandle->get_dup($k); print "$k @v\n"; } # now 22.5 is in the list
The best workaround I have thought of will be to dump all the keys, find the bin that is closest to where I want to start O(log(n) (since list will be sorted), and walk through the list until reaching end boundary condition, calling get_dup on each key in the subset (which still works if only one value is stored for the key).
  • Comment on In order traversal of BTREE keys where not all keys have duplicate values (using DB_File)
  • Download Code

Replies are listed 'Best First'.
(podmaster) Re: In order traversal of BTREE keys where not all keys have duplicate values (using DB_File)
by PodMaster (Abbot) on May 19, 2002 at 23:52 UTC
    What is the definition of your sub _compare?
    (incomplete code cannot possibly demonstrate the problem)

    Use the find_dup method to see if there are any duplicates.
    (change the format of your values to help you out here, or just forget the whole R_DUP thing, and do something with the keys like some_key_part_1, some_key_part_2 ...)

    Here is an example adapted from the DB_File documentation to show you a neat little trick

    use strict ; use DB_File ; use vars qw( $x %h ) ; # Enable duplicate records $DB_BTREE->{'flags'} = R_DUP ; $x = tie %h, "DB_File", undef, O_RDWR|O_CREAT, 0640, $DB_BTREE or die "Cannot open $filename: $!\n"; $h{'Wall'} = 'only once'; $h{'Smith'} = 'more than once'; $h{'Smith'} = 'more than twice'; $h{'Smith'} = 'three times'; my $cnt = $x->get_dup("Wall") ; print "Wall occurred $cnt times\n" ; my %hash = $x->get_dup("Wall", 1) ; print "Larry is there\n" if $hash{'Larry'} ; print "There are $hash{'Brick'} Brick Walls\n" ; my @list = sort $x->get_dup("Wall") ; print "Wall => [@list]\n" ; @list = $x->get_dup("Smith") ; print "Smith => [@list]\n" ; @list = $x->get_dup("Dog") ; print "Dog => [@list]\n" ;
    particularly interesting is this tidbit
    $count = $x->get_dup($key) ; @list = $x->get_dup($key) ; %list = $x->get_dup($key, 1) ;

     

    Look ma', I'm on CPAN.


    ** The Third rule of perl club is a statement of fact: pod is sexy.
      Oops missing the compare function.
       sub _compare { $_[0] <=> $_[1] }

      Keys are numeric bins of the form 10000.000178 - this adapted from a mysql strategy where we are testing for bins within the boundaries but I am interested in an in-memory adaptation.

      I want to iterate through a subset of the keys that are bounded by some values $begin, $end not the whole set of keys as is demonstrated in all the examples. I was using get_dup to initialize the cursor and then iterate through all the available keys, however the cursor will only point to keys which have >1 value.
      ===
      Ahem. I was making it harder than it should be.

      $bhandle->seq($start,$v,R_CURSOR); while( $bhandle->seq($k,$v,R_NEXT) == 0 ) { last if( $k> $end); ... }
      Gets me what I want. Calling get_dup to initialize the cursor was not a smart idea and was the root of the problem.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://167686]
Approved by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2021-12-02 08:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    R or B?



    Results (18 votes). Check out past polls.

    Notices?