All:
Does anyone see a need for a module that will allow you to retrieve hash keys in a pre-defined sorted order? This functionality can be used with any user defined sort routine, can be changed on the fly, works on multi-dimensional hashes, and DWYM if you decide to add key/values later.
There are two modules already that do similar things:
Tie::IxHash
Tie::SortHash
The problem with Tie::IxHash is that it only preserves insertion order as well as provide rudimentary ability and getting keys/values in sorted order.
The problems with Tie::SortHash are:
It uses pseudo hashes
It uses eval string to accomplish the arbitrary sort
It loops (n2 + n) / 2 times through the hash for keys, n = # of keys
It uses its own form of garbage collection
The test suite is not exhaustive
The calling syntax is not flexible, making expansion extremely difficult
Originally, I thought I would just contact the author with suggestions on how to correct these deficiencies or offer to take over maintenance. Almost a month has gone by with no response. I originally posted this as a proposed inplace upgrade for the module. Upon revisiting it yesterday, I have decided that if I do anything, it will be to upload a brand new module. It is too hard to maintain backwards compatability and have the module robust.
The question I have is - why, what's the point? I am not aware that there is a big need for this module. I would certainly not want to put more code up on CPAN without a good reason.
The new module that I have worked on is below, though it still unpolished, requires the POD to be finished, and needs an exhaustive test suite.
#!/usr/bin/perl -w
package Tie::SortedHash;
use strict;
use Carp;
use constant HASH => 0;
use constant LOOKUP => 1;
use constant ARRAY => 2;
use constant SORT => 3;
use constant CHANGED => 4;
use constant OPT => 5;
our $VERSION = '1.00';
sub TIEHASH {
my $class = shift;
croak "Incorrect number of parameters" if @_ % 2;
my %options = @_;
my $self = bless [], $class;
$self->_Build(\%options);
return $self;
}
sub FETCH {
my($self, $key) = @_;
$self->[HASH]{$key};
}
sub STORE {
my($self, $key, $value) = @_;
$self->[HASH]{$key} = $value;
$self->[CHANGED] = 1;
}
sub EXISTS {
my($self, $key) = @_;
exists $self->[HASH]{$key};
}
sub DELETE {
my($self, $key) = @_;
delete $self->[HASH]{$key};
if ($self->[OPT] == 2 && exists $self->[LOOKUP]{$key}) {
splice(@{$self->[ARRAY]}, $self->[LOOKUP]{$key}, 1);
delete $self->[LOOKUP]{$key};
}
}
sub FIRSTKEY {
my $self = shift;
$self->_ReOrder if $self->[OPT] == 1 || ($self->[OPT] == 2 && $sel
+f->[CHANGED]);
$self->_Iterate;
}
sub NEXTKEY {
my ($self, $lastkey) = @_;
$self->_Iterate($lastkey);
}
sub CLEAR {
my $self = shift;
$self->[HASH] = {};
$self->[CHANGED] = 1;
}
sub DESTROY {
}
sub _Build {
my ($self, $opt) = @_;
my $sort = exists $opt->{SORT} ? $opt->{SORT} : sub {
my $hash = shift;
sort {$a cmp $b || $a <=> $b} keys %$hash;
};
$self->sortroutine($sort);
my $hash = exists $opt->{HASH} ? $opt->{HASH} : {};
croak "$hash is not a hash ref" if ref $hash ne 'HASH';
@{$self->[HASH]}{keys %$hash} = values %$hash;
if (exists $opt->{OPTLEVEL}) {
croak "$opt->{OPTLEVEL} is not valid optimization level" if $o
+pt->{OPTLEVEL} !~ /^[123]$/;
$self->[OPT] = $opt->{OPTLEVEL};
}
else {
$self->[OPT] = 1;
}
}
sub _ReOrder {
my $self = shift;
$self->[LOOKUP] = ();
$self->[ARRAY] = ();
my $index = 0;
for my $key ($self->[SORT]($self->[HASH])) {
$self->[LOOKUP]{$key} = $index;
$self->[ARRAY][$index] = $key;
$index++;
}
$self->[CHANGED] = 0;
}
sub _Iterate {
my ($self, $lastkey) = @_;
if ($self->[OPT] != 3) {
my $index = defined $lastkey ? $self->[LOOKUP]{$lastkey} : -1;
$index++;
return $self->[ARRAY][$index];
}
else {
my $match;
for my $key ($self->[SORT]($self->[HASH])) {
+
return $key if $match || ! defined $lastkey;
$match = 1 if $key eq $lastkey;
}
}
return undef;
}
sub sortroutine {
my($self, $sort) = @_;
croak "$sort is not a code ref" if ref $sort ne 'CODE';
$self->[SORT] = $sort;
$self->[CHANGED] = 1;
}
1;
__END__
=head1 NAME
Tie::HashSort - Perl module to get hash keys in a sorted order
=head1 SYNOPSIS
use Tie::HashSort;
my %hash = (
'John' => 33,
'Jacob' => 29,
'Jingle' => 15,
'Heimer' => 48,
'Smitz' => 12,
);
my $sort = sub {
my $hash = shift;
sort {$hash->{$b} <=> $hash->{$a}} keys %$hash;
};
tie my %sorted_hash, 'Tie::SortedHash', 'HASH' => \%hash, 'SORT' =
+> $sort, 'OPTLEVEL' => 2;
for my $name ( keys %sorted_hash ) {
print "$name is $hash{$name} ears old.\n";
}
### OUTPUT ###
Heimer is 48 ears old.
John is 33 ears old.
Jacob is 29 ears old.
Jingle is 15 ears old.
Smitz is 12 ears old.
=head1 DESCRIPTION
This module is a designed to retrieve hash keys in a pre-defined sorte
+d order.
It is often frustrating to have a hash return elements in a seemingly
+random order
when using C<keys()>, C<values()> or C<each()>.
=head2 Tie
In order to C<tie()> your hash to C<Tie::SortedHash>, use the followin
+g syntax:
tie HASH, 'Tie::SortedHash', OPTIONS;
=cut
I believe I have added functionality and avoided all the problems in
Tie::SortHash. Here is my philosophy on the 3 optimization levels:
Level 1: Trade lots of memory for speed. Create an array with the hash keys in sorted order each time FIRSTKEY is called. NEXTKEY is then a simple matter of getting the index of the lastkey and adding 1. This is accomplished by having a hash lookup table.
Level 2: Identical to level 1 only faster. The array with sorted keys and hash lookup table are only recreated when a new key is added, a value is changed, or the sort routine is changed. Deleting a key doesn't require a rebuild because the element is deleted from the array and the lookup hash. This optimization only works on 1 dimensional hashes since it is impossible to detect values being changed below the root level. Depending on the sort routine, this may affect the order.
Level 3: No optimization at all. It is slower, but consumes no additional memory - (n2 + n) /2.
So what do you think - should I work on polishing the code, finishing the POD, building the test suite, and uploading it to CPAN or not?
Cheers - L~R