Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Partial Order

by blokhead (Monsignor)
on Jul 25, 2007 at 19:23 UTC ( [id://628771]=note: print w/replies, xml ) Need Help??


in reply to Partial Order

Since you say that the preprocessing step does not need to be super-fast, then I suggest storing it as a graph. Once you add the "base" comparisons as directed edges, you can compute the transitive closure of the graph. To check a comparison query, you just have to see whether the edge exists in the transitive closure, which is a very fast operation if the graph is pre-computed.

The only thing to watch out for is the equalities in the data. As long as we're using graphs, you can use an undirected graph to keep track of which items are equivalent, so you can map the members of each equivalence class to a unique representative. Also, if something comes in the data that says xx>yy and later yy>xx, should I infer that xx=yy, or is that an error? My sample code below treats it as an error (the resulting graph is not a DAG), but I suppose you could do either.

Here is a rough proof-of-concept that I hope doesn't have too many bugs.

use strict; use Graph::Undirected; use Graph::Directed; my @data = qw[ xx<yy yy<zz zz>aa bb<yy yy=aa aa>cc ]; ## take all equalities in the data, map each item to ## a canonical (equivalent) representative. my $eq = Graph::Undirected->new; $eq->add_edge( split /=/, $_ ) for grep /=/, @data; my %canonical; for ($eq->connected_components) { my $representative = $_->[0]; for (@$_) { $canonical{$_} = $representative; } } sub canonical { exists $canonical{$_[0]} ? $canonical{$_[0]} : $_[0]; } ## take all inequalities in the data, make a dag from them ## and take the transitive closure. my $g = Graph::Directed->new; for (grep !/=/, @data) { my ($x, $y) = split /<|>/, $_; ($x,$y) = ($y,$x) if /</; $g->add_edge( canonical($x), canonical($y) ); } die "Not a DAG!" unless $g->is_dag; $g = $g->transitive_closure; sub compare { my ($x,$y) = map canonical($_), @_; return 0 if $x eq $y; return 1 if $g->has_edge($x,$y); return -1 if $g->has_edge($y,$x); return 0; } my @items = qw[ aa bb cc xx yy zz ]; for my $x (@items) { for my $y (@items) { print "$x, $y: " . compare($x,$y). $/; } }

blokhead

Replies are listed 'Best First'.
Re^2: Partial Order
by Limbic~Region (Chancellor) on Jul 25, 2007 at 20:01 UTC
    blokhead,
    To check a comparison query, you just have to see whether the edge exists in the transitive closure, which is a very fast operation if the graph is pre-computed.

    No offense, but your compare() sub looks like it would be much slower than the OP's HoH solution. Since this is the sub that will be called millions of times, have you Benchmarked your assertion? Especially if the OP's compare() became return $lookup{$x}{$y}; since undef is implied.

    Cheers - L~R

      I admit I didn't study it the OP's code in great detail. Now I see that it probably does an equivalent thing to what mine does: infer all transitive relations from the data and store them in an object that supports individual queries. The OP code infers the transitive relations in some ad-hoc way, storing in a hash; mine uses a graph module and stores the results in a graph. Of course, you can probably guess that I prefer the elegance of using the high-level abstraction of graph theory ;) (probably because I don't want to re-implement transitive closure logic) So perhaps my previous post should be interpreted as advice on computing the transitive closure using pre-existing hammers instead of an ad-hoc method.

      As for efficiency of the compare() sub.. I doubt mine is insanely slow. At worst, the calls to $g->has_edge would cost a few hash lookups in the internals of Graph. If optimization really is a concern at such a low level, you're right -- nothing could be faster than a simple hash lookup. Of course, you could preemptively convert the whole graph's adjacency information into a HoH after doing the transitive closure. For only 50 items, it should be no problem.

      Here is an interesting tradeoff that the OP might be willing to consider (described in the language of graphs, but could be also be coded using the HoH of adjacencies). At the moment, both my code and the OP's essentially precompute answers to all possible queries, which on some level might feel inelegant. A more interesting tradeoff is to put all the relations into the graph, but don't compute the transitive closure right away. Instead, whenever a query comes, do a reachability search in the graph (say, breadth-first search from one of the points). For every path that you happen to inspect at along the way, add in all the edges which are inferred by the transitivity property. The next time you query the graph, there will be new "shortcuts" that you can exploit. Eventually, with enough queries, the graph slowly approaches the transitive closure of the original graph.

      blokhead

        blokhead,
        Of course, you can probably guess that I prefer the elegance of using the high-level abstraction of graph theory ;)

        Yes, I know how much you like theory and abstractions, but I also know how slow they can be.

        As for efficiency of the compare() sub.. I doubt mine is insanely slow. At worst, the calls to $g->has_edge would cost a few hash lookups in the internals of Graph. If optimization really is a concern at such a low level...

        From the original post, I had focused on "I need to implement a fast compare..." and, more importantly, "I need to do millions of comparisons..." With function calls being notriously slow in perl, adding 1 or 2 more to compare() is bound to be slower not faster which is what I believe the OP is after.

        I follow everything you write very closely and in private conversations I have expressed my interest in you writing a tutorial to graph theory as a means to problem solving. I think your abstraction could still be very useful as a means for making the code more elegant provided there is a straight forward way to convert the resulting graph back into the HoH (for fast lookups).

        Cheers - L~R

        Thanks. The Graph was what I was looking for in terms of elegance. After the graph computation, one could precompute an HoH or a simple hash with combined strings as suggested before.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://628771]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2024-04-19 01:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found