in reply to Evolving a faster filter?

Here is a "best ordering" optimizer that is just my previous trivial code with two short-circuits added to it.

First, if we've managed to come up with an ordering that costs 18.1 units already (our best ordering so far), and our current calculations only get through the first 9 filters and have already added up 18.2 units of cost, then there's no point permuting any of the remaining filters (nor measuring any further using that starting arrangement).

Second, if my next permutation is going to place a filter with a cost of 5 and a selectivity of .9 in front of a filter with a cost of 4 and a selectivity of .8, then I would just be wasting time to do anything with that permutation. So this code does some efficient checks to prevent at least a lot of cases of putting one filter in front of another filter that it is completely worse than (neither cost nor selectivity improved). Well, assuming the filters start out sorted reasonably, anyway.

This code starts out by naively sorting the filters by the product of cost and selectivity. That gets you pretty close to the ideal solution in the few cases I dreamed up so far. It also ensures that no filter starts out in front of a filter that is completely better than it, which is required for the second short-circuit above to be very effective.

I don't have time right now to explain this code or even test it more than the little bit that I have. NextPermuteNum() in this code is just a modification of the one from Algorithm::Loops.

The usage is to give a list of filter performance values on the command line. Each performance value is an integer cost followed by a decimal selectivity. So "4.9" indicates a filter that eliminates 10% of items (keeps 0.9) and costs 4 units (of CPU).

So a sample run might look like:

% filterOrder 1.9 1.8 2.9 2.8 3.9 3.8 4.9 4.8 30.1 20.2 10.3 1 .8, 1 .9, 2 .8, 2 .9, 3 .8, 3 .9, 30 .1, 10 .3, 4 .8, 4 .9, 20 .2, 0.002 19.00391 ( 0 1 2 3 4 5 6 7 8 9 10 ) 0.100 12.88756 ( 0 1 2 7 4 8 3 10 5 6 9 ) 9 1 .8, 1 .9, 2 .8, 10 .3, 3 .8, 4 .8, 2 .9, 20 .2, 3 .9, 30 .1, 4 .9, 3.750 seconds

Which shows the first stab at a solution and the final solution were:

1 .8, 1 .9, 2 .8, 2 .9, 3 .8, 3 .9, 30 .1, 10 .3, 4 .8, 4 .9, 20 .2 | | | \+3 | \+3 \+3 /-3 /-3 \+1 /-3 | | | | | | | /+3 | /+3 \-3 /+3 \-3 \-3 \-1 1 .8, 1 .9, 2 .8, 10 .3, 3 .8, 4 .8, 2 .9, 20 .2, 3 .9, 30 .1, 4 .9

And it took only 0.1 seconds to find the optimal solution (but then took 3.75 seconds to verify that there was nothing better). An exhaustive search would have taken minutes without those short-circuits.

It still isn't practical for optimally ordering 30 or 40 filters; the run time would depend greatly on how many filters can be declared "completely better than" how many others but could easily run into the "millennia" range for such large lists.

It can probably be most useful for improving the weighting formula used for the initial sort once you have some typical numbers for your situation. $cost*$selectivity isn't a bad starting point, but I'm sure, especially given a limited pattern of values, one can quickly come up with something better with a bit of trial and error.

If the sort gets the first several filters in the right order, then an optimal solution can be had very quickly. A good ordering can be had immediately and better orderings can pop out as you let the thing run.

It may also be useful as something to further modify to get to something that can get even closer to the optimal total cost rather quickly. For example, if you identify a subset of the filters that are likely to be the best ones, then you can separately optimize just that subset first to get a better first part of the ordering.

Anyway, here is the code (which is not pretty nor modular but appears to work reasonably well in the few test cases I played with).

#!/usr/bin/perl -w use strict; use Time::HiRes qw< time >; $| = 1; my( @Costs, @Trims ); for( @ARGV ) { my( $cost, $trim ) = /^(\d+)(\.\d+)$/ or die "Invalid cost.trim: $ +_\n"; push @Costs, $cost; push @Trims, $trim; } my $Start = time(); my @order = sort { $Trims[$a]*$Costs[$a] <=> $Trims[$b]*$Costs[$b] } 0..$#Costs; @Trims = @Trims[@order]; @Costs = @Costs[@order]; print " $Costs[$_] $Trims[$_]," for 0..$#Costs; print $/; my @Win; cost( \@Costs, \@Trims ); print $/; print " $Costs[$_] $Trims[$_]," for @Win; printf "\n%.3f seconds\n", time() - $Start; sub NextPermuteNum(\@) { my( $vals )= @_; my $last= $#{$vals}; return !1 if $last < 1; while( 1 ) { # Find last item not in reverse-sorted order: my $i= $last-1; $i-- while 0 <= $i && $vals->[$i+1] <= $vals->[$i]; # If complete reverse sort, we are done! if( -1 == $i ) { # Reset to starting/sorted order: @$vals= reverse @$vals; return !1; } # Re-sort the reversely-sorted tail of the list: @{$vals}[$i+1..$last]= reverse @{$vals}[$i+1..$last] if $vals->[$last] < $vals->[$i+1]; # Find next item that will make us "greater": my $j= $i+1; $j++ while $vals->[$j] <= $vals->[$i]; do { # Swap: @{$vals}[$i,$j]= @{$vals}[$j,$i]; # If "greater" item isn't strictly worse than try it: return 1 if $Costs[$vals->[$i]] < $Costs[$vals->[$j]] || $Trims[$vals->[$i]] < $Trims[$vals->[$j]]; # Else, we picked a strictly worse one, so pick again $j++; } while( $j <= $last ); # Force us to move back to at least $i-1: @{$vals}[$i..$last] = sort { $b <=> $a } @{$vals}[$i..$last]; } } sub cost { my( $costs, $trims ) = @_; my @order = 0 .. $#$costs; my $best = 0; my $max = 0; do { my $size = 1; my $cost = 0; my $skip = @order; for my $i ( @order ) { $skip--; last if $best && $best < $cost; $cost += $size * $costs->[$i]; $size *= $trims->[$i]; } if( ! $best || $cost <= $best ) { printf "\r%.3f %.5f ( %s )", time() - $Start, $cost, "@order"; print $/ if ! $best; @Win = @order; $best = $cost; $max = 0; } elsif( 1 < $skip ) { if( $max < $skip ) { printf "%4d\b\b\b\b", $skip; $max = $skip; } @order[$#order-$skip+1..$#order] = sort { $b <=> $a } @order[$#order-$skip+1..$#order]; } } while( NextPermuteNum(@order) ); }

- tye        

Replies are listed 'Best First'.
Re^2: Evolving a faster filter? (sort)
by tye (Sage) on Jan 04, 2013 at 21:10 UTC

    Heh, I just had what might be a great idea on how to do the first sort.

    It is trivial to do a two-item optimization in your sort comparison routine. This is "cheating" because you can end up telling sort that you want $a < $b and $b < $c but $c < $a. On old versions of Perl, that might even cause a core dump. I think it is actually documented as safe on modern Perl, however.

    my @order = sort { $Costs[$a]+$Trims[$a]*$Costs[$b] <=> $Costs[$b]+$Trims[$b]*$Costs[ +$a] } 0..$#Costs; @Trims = @Trims[@order]; @Costs = @Costs[@order];

    It is that simple. It worked reasonable well in a couple of test cases. It even found an optimal solution on the first try for my prior example above (but took too long to declare success, probably indicating a bug).

    - tye        

      my @order = sort { $Costs[$a]+$Trims[$a]*$Costs[$b] <=> $Costs[$b]+$Trims[$b]*$Costs[ +$a] } 0..$#Costs;

      I'm pretty sure that this is the optimal solution, but I'm too lazy to write down the complete mathematical proof.

      The basic idea is that the terms represent the general cost difference of swapping the first and the second filter╣!

      So by sorting you get a solution which can't be improved by a pairwise swapping and any permutation can be represented by a sequence of pairwise swaps.

      Cheers Rolf

      ╣) see restructured formula for c in Re^2: Evolving a faster filter? (combinatorics) and you will notice that these are the first elements and the rest is fix.


      basic transformations lead to a simpler solution:

      my @order = sort { $Costs[$a]/(1-$Trims[$a]) <=> $Costs[$b]/(1-$Trims[$b]) } 0..$#Costs;

      which can be precalculated to save calculation time

      my @order = sort { $weight[$a] <=> $weight[$b] } 0..$#Costs;

      (the edge case of a division by zero must be handled as practically infinity)

        That's not enough to prove it. You certainly get a solution that can't be improved by a single swapping of adjacent filters. And if your sort order using that algorithm is well-defined, then you will get an optimal solution. That is, if you never get the $a < $b < $c < $a case. Update: No, maybe not even that (unless someone can convince me that "of adjacent filters" is not needed above). Update: Ah, ruling out adjacent swaps also rules out any swaps. Update: Silly me. Ruling out swapping of elements that start out adjacent doesn't rule out swapping of elements that didn't start out adjacent (but became adjacent from swapping elements that did start out adjacent).

        I thought I had already found a case where it wasn't optimal. But the total cost was reported as identical as the first cost despite a different ordering being chosen. So it might be hard to find a case where it isn't optimal.

        I bet it makes doing anything more than that not worth the effort for Ovid's usage, at least. :)

        - tye        

Re^2: Evolving a faster filter? (code)
by Ovid (Cardinal) on Jan 07, 2013 at 11:32 UTC

      Perhaps you missed the significance of the timings:

      C:\test>1011850 -O=100000 Dict read at C:\test\ line 96. objects loaded at C:\test\ line 106. App created at C:\test\ line 112. Ovid: (optimal ordering) returned 9193 objs in 4.447958 seconds Ovid: (pessimal ordering) returned 9193 objs in 7.236690 seconds RichardK: (optimal ordering) returned 9193 objs in 4.065038 seconds RichardK: (pessimal ordering) returned 9193 objs in 6.219226 seconds BUK: returned 9193 objs in 0.760572 seconds Filters run at C:\test\ line 138.

      Running your OP algorithm against a set of 100 filters each carefully arrange to filter an increasing number of objects and ordered best to worst, and running two tests -- optimal and pessimal -- the difference between then is less that 50%. That is the best you can hope to achieve with your order-the-filters mechanism.

      However, by altering the way the filters operate upon the objects, you can achieve an order of magnitude saving.

      Or perhaps you've just decided how you want to tackle the problem. Regardless...

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.