demerphq has asked for the wisdom of the Perl Monks concerning the following question:

Hello All,

Recently I have been working on a set of perl implementations of common data structures/algorithms (2-3 Trees, Treaps, Binary Tree's etc). I started this mostly in order to bone up my own understanding of these interesting algorithms (some of them I encountered earlier during school, etc) however as I proceded I decided that it might be useful to produce tutorials based on my work and to publish them to CPAN for use by others. However I have now reached a point of quandry. How to define/design the interface so that it is

  1. Intuitive
  2. Easy to use
  3. Extensible
  4. Powerful
  5. Transparent
  6. Efficient
  7. Understandable
A data structure/algorithm normally (well, in OO terms anyway) has two distinct components. That of the primary object, and that of "nodes" or "elements" that exist within the primary object. The primary object defines the ways to manipulate the data structure and the nodes define how the end user puts information into the data structure.

In my case I am looking at defining an interface for search trees. Defining the primary objects interface is relatively straight forward at first glance, there are a set of common methods that need to be provided
Find an element in the strucutre and return it
Add an element to the structure
Delete an element from the structure
Join two data structures together
Split a data stucture into two parts, usually by some key
Traverse the data structure, passing over each element it contains, perhaps with some form of callback for each element
Such algorithms/data structures use a "key" which determines where in the data structure an element goes. As well, normally some method is required to determine the ordering relationship between keys. When the data structure is hard coded to deal with numbers or alphanumerics the implementation is trivial. However things get more interesting when trying to provide a general interface. If the user wants to store complex items in the data structure (a not unreasable desire) there arise the problem of how do you determine the ordering relationship? How do you provide a means for a given nodes contents in the structure to change over time? (this is particularly important in Heaps for example).

The Heap:: heirarchy uses inheiretance to do this. Every item that is added to the data structure needs to inheiret from Heap::Element and must implement two methods, one for the heaps own use, and one a comparison function. This approach while satisfying 3 and 4 of my requirements (it is powerful and extensible) doesnt seem IMO to satisfy the rest. In my experience it is annoying and time consuming to implement quickly, it isnt very intuitive (many a time I have had to OO fairly simple constructs that I normally wouldnt have except to use the modules) and it really isnt that efficient.

My partial solution to this has been to use Perls functional capabilites to implement the comparison requirement. Thus when creating a new object the user supplies a CODE ref responsible for the comparison. However what do when you need to support altering a node efficiently? Should the insert routines return a reference to a node? Should I just use inheiretance as the Heap modules do? Is there any way that I could provide multiple options, such as functional or OO approaches?

Please accept my apologies if my question is not super well defined. Perls TMTOWTDI has left me with so many implementation routes that I really cant decide what the various monks think would be a useful interface.

I really would appreciate virtually any thoughts that you all might have on this matter.

Yves / DeMerphq
Writing a good benchmark isnt as easy as it might look.
  • Comment on Standardized Interface Design for Search tree

Replies are listed 'Best First'.
Re: Standardized Interface Design for Search tree
by ariels (Curate) on Jun 25, 2002 at 11:49 UTC
    One Way To Do It1 is to steal ideas off of C++ and the STL. STL is based on concepts. A concept is a requirement of a thing, that is a combination of syntactic requirements (say, "two things must be comparable by saying $a->le($b)") and semantic requirements (say, "and this relationship must be a pre-order: if $a->le($b) && $b->le($c) then $a->le($c), and $a->le($a)"). In C++, operator overloading is prevalent, so these requirements are typically phrased in terms of existing operators; in Perl, overloading is probably too slow to force on the user, so we should avoid it.

    The C++ STL then uses these concepts as bases for generic programming using templates. The C++ vector<T> isn't a vector<Object>! It doesn't require that all Ts inherit from a common ancestor (compare the aforementioned Hash::Element). This would not be type-safe (a vector of cars is NOT a vector of vehicles; the former cannot accept a submarine, but the latter can!). Instead, when you sort a vector (you really sort ranges in C++, based on the interval between 2 iterators, but let's keep things simpler than they really are...), the compiler generates code that assumes it can meaningfully compare T t1, t2; by checking whether t1 <= t2.

    Perl is not C++. It isn't type-safe: variables don't really have types (you can say my Foo $foo, but the compiler cannot enforce this type safety). And it doesn't have templates; you can see these as only necessary for a type-safe language (so Perl doesn't need them), or as allowing for much better compilation, paying the price of type safety for vastly faster execution. But this should not prevent us from stealing!

    So here's what I'd propose:

    • A search tree organises things of one or more types.
    • For any $a,$b in the search tree, $a->le($b) must be defined (i.e. "$a->can('le')").
    • This relation must be a pre-order: if $a->le($b) and $b->le($c) then $a->le($c), and always $a->le($a).
    • No inheritance of any kind is required to be defined.
    • (Optionally:) you may provide a coderef for comparisons; if you do not, the following will be used:
      sub { $_[0]->le($_[1]) }
    Given these rules, you can define your data structures.

    But how can we implement them cleanly? Well, some things (e.g. Math::BigInts) are compared using <=. For such MyObjs, we wrap as follows:

    package Numeric; sub le { $_[0] <= $_[1] } package MyInt; our @ISA = qw(Math::BigInt Numeric);
    (This pattern is usually called a "mixin": Numeric is a base class which only makes sense as part of multiple inheritance; you mix it in to some other class). Now MyInts are Math::BigInts which can be put into a Demerphq::SearchTree! If you have plain numbers, you can just bless them into Numeric, and they're ready for use in your search tree.

    Of course, a similar strategy would give you Lexical, if needed. And a Hash::Element can be similarly wrapped by translating the method cmp into le.

    Since a common base class for contained things buys us nothing in Perl (and almost nothing in other languages), we don't bother to make us of it.

    If you really wanted to be creative, you could use TheDamian's multimethods to give a more symmetric definition. I wouldn't bother, though.

    The most important differences in what I propose from C++ are that only objects can go into a search tree (C++ can put anything in any container, as long as it matches the concepts -- and these never require any methods of the "thing"), that there is no type safety, and that you lose out on C++'s optimization opportunities.

    But then, if you want C++, you know where to get it...


    1. "Way To Do It" is (I hope!) an unregistered trademark of Larry Wall.


    I forgot. We have no SPL.
      I couldn't let this untruth stand unchallenged:
      Perl is not C++. It isn't type-safe
      While the first part is certainly true, the second part is absolutely not. Perl is compile-time type-safe in that a scalar cannot be accessed as a hash or array, nor can an array be accessed as a scalar or hash, etc etc. Perl is also run-time type-safe in that an object method cannot be called on an object that does not understand it without triggering an exception.

      Just because Perl is "smalltalk-like" in its type safety doesn't mean you have to "dis" it if you've only seen Java and C++ type-safety.

      And for the record, I prefer Perl's type safety to "always and only compile-time" type-safety in those other languages. For every hour you spend "working around" those, you could have been coding another useful hour instead.

      Compile-time type safety is neither necessary nor sufficient.

      -- Randal L. Schwartz, Perl hacker

Re: Standardized Interface Design for Search tree
by broquaint (Abbot) on Jun 25, 2002 at 11:22 UTC
    Why not mix and match the OO and the functional? Maybe you have a standard class for performing generic operations on trees, but certain tree-types require particular ops so you can just field those out to coderefs e.g
    package Tree::Ops; sub delete { ... } sub join { ... } 1; # ... later sub search { ... } my $tree = Tree::Struct->new( someops => 'Tree::Ops', searchop => \&search );
    This has the benefits of re-usability, flexibility and all those funky OO features. Or you could declare the functions straight into the symbol table of said package e.g
    sub Tree::Ops::avl_search { ... }
    Gaining all the benefits of being in a class while still giving the control to the user[1].


    [1] I'm sure I should've used a bunch of OO buzzwords there ... ah well ;-)

Re: Standardized Interface Design for Search tree
by educated_foo (Vicar) on Jun 25, 2002 at 15:28 UTC
    I ran across some of these same issues (and some others) trying to create a Perl wrapper for STL sets and priority queues. My solution was incomplete, and the module never quite made it to CPAN, but maybe the experience could be useful. If anyone wants to take a look at the code, email me (educated underscore foo at yahoo) and I'll be happy to tar it up and send it.

    First, implementing the data structure manipulation in C is a big win, as is using C for the comparison function. So when creating an instance of the data structure, you can either pass in a code ref or a special value for the standard number and string comparisons ({NUM,STR}_{LT,GT}). With one of these special values, the C code does the appropriate comparison directly.

    Second, I just punted on modifying elements that had already been inserted. This is the wrong decision for heaps, but it makes the interface much simpler, and I had recently been traumatized by Heap::Element. For ordered trees, it's not as bad -- the implementation has to remove the old value and re-insert the new one anyways, so you can just do that yourself (or create a Perl member function to do it). I like your idea of returning a reference, though I probably wouldn't use a real Perl reference. For one thing, it needs to keep some sort of index into your data structure. For another, a tied scalar with a STORE method will look nicer.

    Third, arrays are handled naturally in Perl, and RAM is cheap, so I avoided iteration with callbacks. Instead, I provided a range() function to take slices of ordered sets:

    $x->range($a, $b) Return all elements $i in $x such that $a <= $i < $b. If $a is undef, treat it as less than all elements in $x. If $b is unde +f, treat it as greater than all elements in $x.
    So you do things like this:
    @foo = grep { something } $x->range($a, $b) $y->insert($x->range($a, $b))
    This leaves efficient joins and splits unaccounted for, of course. I was planning on having a second function just like range() to grab opaque slice objects, which could then be put into existing trees (join) or used to create new ones (split). Then I ran into some wierd XS/Inline::CPP problems and moved on before figuring out what was going on.

    Finally, it's nice to be able to insert arrays of things in a single call, rather than looping over them.

    Good luck,