Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

RFC - inplace upgrade for Tie::SortHash

by Limbic~Region (Chancellor)
on Aug 10, 2003 at 02:10 UTC ( [id://282534]=perlmeditation: print w/replies, xml ) Need Help??

All:
I was going to suggest Tie::SortHash as a possible solution to this node requesting help when sorting hashes, but decided not to. I looked under the hood and saw that it was using deprecated pseudo hashes. It has been nearly three years since this module was touched. I have contacted Casey, the author, to see if they plan on updating it. I would like to see one of three two things happen:
  • Casey update the module, possibly with input
  • I take over maintenance of the module
  • I upload a new module that does the same thing only more

    I have enclosed my proposed first draft of code as well as some comparisons with the original. Please provide your feedback on the code as well as any other advice that might be helpful.

    package Tie::SortHash; use strict; use constant HASH => 0; use constant LOOKUP => 1; use constant ARRAY => 2; use constant SORT => 3; use constant CHANGED => 4; use vars qw($VERSION); $VERSION = '1.02'; sub TIEHASH { my $class = shift; my $hash = shift || {}; my $sort = shift || q($a cmp $b || $a <=> $b); my $self = bless [], $class; $self->_Build($hash, $sort); return $self; } sub FETCH { my($self, $key) = @_; $self->[HASH]{$key}; } sub STORE { my($self, $key, $value) = @_; $self->[HASH]{$key} = $value; $self->[CHANGED] = 1; } sub EXISTS { my($self, $key) = @_; return exists $self->[HASH]{$key}; } sub DELETE { my($self, $key) = @_; delete $self->[HASH]{$key}; splice(@{$self->[ARRAY]}, $self->[LOOKUP]{$key}, 1); delete $self->[LOOKUP]{$key}; } sub FIRSTKEY { my $self = shift; $self->_ReOrder if $self->[CHANGED]; $self->_Iterate; } sub NEXTKEY { my ($self, $lastkey) = @_; $self->_ReOrder if $self->[CHANGED]; $self->_Iterate($self->[LOOKUP]{$lastkey}); } sub CLEAR { my $self = shift; $self->[HASH] = {}; $self->[CHANGED] = 1; } sub DESTROY { } sub _Build { my ($self, $hash, $sort) = @_; @{$self->[HASH]}{keys %$hash} = values %$hash; $self->sortblock($sort); $self->_ReOrder; } sub _ReOrder { my $self = shift; $self->[LOOKUP] = (); $self->[ARRAY] = (); my $index = 0; my $hash = $self->[HASH]; for my $key (eval $self->[SORT]) { $self->[LOOKUP]{$key} = $index; $self->[ARRAY][$index] = $key; $index++; } $self->[CHANGED] = 0; } sub _Iterate { my ($self, $index) = @_; $index = -1 unless defined $index; $index++; defined $self->[ARRAY][$index] ? $self->[ARRAY][$index] : undef; } sub sortblock { my($self, $sort) = @_; my $hash = $self->[HASH]; $sort =~ s/\$hash/\$hash->/g; $self->[SORT] = "sort { $sort } keys %\$hash"; eval $self->[SORT]; die $@ if $@; $self->[CHANGED] = 1; } 1;

    Comparisons
  • Original uses pseudo hashes, proposed replacement does not
  • Original sorts hash every time each/keys/values is called. Proposed only when needed.
  • Both modules use eval to allow arbitrary sort routines
  • Original module should be much slower
  • Original module should consume far less memory
  • Original uses a manual form of garbage collection for delete, proposed module does not.
  • Requested Feedback
  • Code critique
  • Provide a way to prefer saving memory over speed
  • Get rid of the eval - not sure if possible
  • If anyone is using the original, does the proposed module continue to work? If yes, is it faster?
  • A better test suite
  • With the choice of memory/speed - how can it be optimized to be most efficient for either one
  • Thanks in advance. I will by flying to Dallas tomorrow (10 Aug 03), so I apologize in advance for not being able to reply immediately.

    Cheers - L~R

    Replies are listed 'Best First'.
    Re: RFC - inplace upgrade for Tie::SortHash
    by belg4mit (Prior) on Aug 10, 2003 at 02:22 UTC
      Interesting use of constant, I kind of like it. I would make a wrapper for tie-ing so that you could provide a sub protoyped to accept an optional block so you can change sort/eval definition system. If no sort is specified it should do a default sort (not even the current "smart" default you have coded).

      Update, for how to use coderef en lieu of eval.

      sub sortblock { my($self, $sort) = @_; local *hash = %{$self->[HASH]}; sort $self->[SORT] keys %hash; $self->[CHANGED] = 1; }

      --
      I'm not belgian but I play one on TV.

        belg4mit,
        I tried for the better part of an hour trying to figure out how to do one of the following with no luck:
      • Get rid of the eval all together
      • Prefer not to use eval, but allow it for backwards compatability

        The issue is that you are creating a sort routine for a %hash that is defined in the module, not the calling script.
        I don't see how to get around this even with your suggestion and updated code.

        The smart sort came from the original module - I left it for backwards compatability.

        Cheers - L~R

        Update: I figured out how to do the second one. eval is now only used for backwards compatability if the calling script uses the old q($a cmp $b) syntax instead of a code ref:

    Re: RFC - inplace upgrade for Tie::SortHash
    by demerphq (Chancellor) on Aug 10, 2003 at 14:18 UTC

      I cant help but wonder if your use of autovivification to create the various elements of the object is dangerous. I wouldnt do it personally. I have a feeling at least some of your code will die if somebody calls the wrong function at the wrong time. For instance, when I turn warnings on (why arent you using them?!) the following snippet produces warnings and also doesnt work as expected:

      tie my %hash,'Tie::SortHash',{a=>1,b=>2}; delete $hash{'x'}; print join "\t",keys %hash;
      Use of uninitialized value in splice at D:\Temp\tie_sorthash.pl line 4 +3. b

      Where did the 'a' key go? Also delete() doesnt set the CHANGED flag, and will not play nicely with keys. (It should always be possible to delete the last visited key while iterating the hash without disrupting the hash.)

      sub _ReOrder { my $self = shift; $self->[LOOKUP] = (); $self->[ARRAY] = ();

      If you intend to set these two to undef then I can think of more obvious ways. Otherwise this is a bug.

      defined $self->[ARRAY][$index] ? $self->[ARRAY][$index] : undef;

      What exactly is the purpose of this line? Isnt the conditional totally redundant?

      I think you need to revisit the code and build up a big set of tests. I have a feeling there are other whammies in there as well. Another question I have is why the iterator requires a hash lookup each time?


      ---
      demerphq

      <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...
        demerphq,
        I really appreciated your comments. I haven't touched this for almost a month. I was waiting on the module author to return my emails, which did not happen. Today I decided to start over. I would like to address the points you made on the previous code:

      • I cant help but wonder if your use of autovivification to create the various elements of the object is dangerous
        The code creates a new hash from an old hash. It is only dangerous to the extent that I trusted that what I was being passed was a hash reference without testing it.

      • Why arent you using warnings?!
        The warnings were not turned on because the original module author did not use them.

      • Where did the 'a' key go?
        Code I added to optimize the iteration created an array with the hash keys in order. I figured another optimization would be to delete the array element without going through the sort routine again. The problem was that I didn't first verify that the key being deleted actually existed. The 'a' disapeared because it happened to be the 0 index - the result of an undefined variable

      • Also delete() doesnt set the CHANGED flag, and will not play nicely with keys. (It should always be possible to delete the last visited key while iterating the hash without disrupting the hash.)
        Correct - it doesn't set the changed flag. It accomplishes the same thing by deleting the array element described above. I do not believe there would be any disruption with deleting a key while iterating other than your discovery of deleting a key that doesn't exist.

      • If you intend to set these two to undef then I can think of more obvious ways. Otherwise this is a bug.
        defined $self->[ARRAY][$index] ? $self->[ARRAY][$index] : undef; What exactly is the purpose of this line? Isnt the conditional totally + redundant?

        The code is correct as written. The conditional is designed to detect when you have fallen off the end of the array and return undef. I am not sure what is more obvious in undef'ing the arrays then what I used?

      • I think you need to revisit the code and build up a big set of tests. I have a feeling there are other whammies in there as well.
        I indeed did find lots of whammies today when I went back over the code. A few I introduced when I tried to "fix", in my opinion, very bad code. The rest were already there.

      • Another question I have is why the iterator requires a hash lookup each time?
        I use an array with the hash keys in order. The problem is that I am only aware of the last key. I use the hash lookup to get the index of that element in the array. Getting the next key is then a simple matter of adding 1.

        Later, when I have time to build up a test suite and finish the POD, I will re-post an RFC that outlines my questions and concerns. In the interim, here is the new non-backwards compatible code.

        Cheers - L~R

          Today I decided to start over. I would like to address the points you made on the previous code:

          Hmm, im not up for responding completely yet, so you can expect a /msg that this reply has been updated. However, there is one point id like to address:

          The code is correct as written. ...

          defined $self->[ARRAY][$index] ? $self->[ARRAY][$index] : undef;

          Except that its misleading as hell. It is _exactly_ equivelent to

          $self->[ARRAY][$index]

          Except that it makes a reader wonder what magic specialness is going on, and is marginally slower. If nothing special is happening then it should be changed to the latter IMO


          ---
          demerphq

          <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...
    Re: RFC - inplace upgrade for Tie::SortHash
    by perrin (Chancellor) on Aug 10, 2003 at 02:33 UTC
      Hmmm. Why not Tie::IxHash? That's what I've always used.
        perrin,
        Because it just preserves the insertion order and allows you to do a simple sort by value or keys. It doesn't allow an arbitrary sort of a complex hash. It also doesn't allow re-ordering on the fly.

        Cheers - L~R

    Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Domain Nodelet?
    Node Status?
    node history
    Node Type: perlmeditation [id://282534]
    Front-paged by diotalevi
    help
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this?Last hourOther CB clients
    Other Users?
    Others having a coffee break in the Monastery: (2)
    As of 2024-04-25 23:00 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found