http://qs321.pair.com?node_id=1011650

Ovid has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

The Code

I have some code which resembles the following:

sub remove_unwanted { my ( $self, @objects ) = @_; foreach my $filter ($self->filters) { @objects = $self->$filter(@objects); last unless @objects; } return @objects; }

There might be thousands of @objects and tens of filters. The filter will return zero or more objects. The number of filters is static (for a given software release), but the number of objects will always change (duh).

The Problem

This is some of the hottest code on a high performance system spread across 75 servers. It needs to run fast, fast, fast. The order in which the filters run can impact this greatly. I've had anywhere between 7% to 41% performance speedups by choosing the order carefully. This means that the slowest filters should usually run later and filters which filter out the most @objects should usually run sooner. However, what if we have a slow filter which filters out many objects? Or a fast filter which filters virtually nothing? Further, the criteria on which things get filtered is very complex and rapidly changes and requires that we use real-world data to determine what's really faster.

What Solution?

At first glance, this really looked like a candidate for genetic programming. One or two of our servers could run this code, instrumented to keep mutating the order of the filters until we find a "fast enough" solution.

I don't want to go down that route if someone can suggest a much easier way. On the other hand, if you think a genetic algorithm is the way to go, I wouldn't mind seeing how you would approach it.

Replies are listed 'Best First'.
Re: Evolving a faster filter? (math)
by tye (Sage) on Jan 04, 2013 at 15:33 UTC

    If you can measure the speed and selectivity of each filter, then you don't need to run the filter to pick the fastest ordering. You can just compute the average speed of each ordering.

    Sure, use real runs to measure the cost and selectivity. But why slow down your determination of the fastest ordering by only trying different orderings in Production?

    As the measured average speed and average selectivity of each filter is updated from logs of Production work, do the simple calculations to pick the likely fastest ordering on average and push that ordering to Production.

    If there is large variability in the speed or selectivity of a single filter, then you might want to compute based on more than just a single average of each. Of course, that gets significantly more complicated and probably would be quite difficult or just unlikely to give you consistently better results.

    Update: The computation code is pretty simple (an untested version that shows predicted cost for "average" case as it finds cheaper orderings, as I thought that'd be interesting to watch as I fed it patterns of slow but selective and fast but unselective filter measurements:

    use Algorithm::Loops qw< NextPermuteNum >; sub cost { my( $costs, $trims ) = @_; my @order = 0 .. $#$costs; my $best = 0; do { my $size = 1; my $cost = 0; for my $i ( @order ) { $cost += $size * $costs->[$i]; $size *= $trims->[$i]; } if( ! $best || $cost <= $best ) { print "$cost: @order\n"; $best = $cost; } } while( NextPermuteNum(\@order) ); }

    - tye        

Re: Evolving a faster filter?
by RichardK (Parson) on Jan 04, 2013 at 15:30 UTC

    It seems like you are traversing the list of thousands of objects many times, and maybe even copying lots of objects too!

    So would it be faster to only traverse the objects once? So something like this

    NEXT_OBJ: foreach my $obj (@objects) { foreach my $filter (@filters) { next NEXT_OBJ unless $filter->($obj); } push @results,$obj; }

    I think there's a smart way to do that with iterators/closures too, but I don't know how fast it would be.

Re: Evolving a faster filter? (code)
by tye (Sage) on Jan 04, 2013 at 20:41 UTC

    Here is a "best ordering" optimizer that is just my previous trivial code with two short-circuits added to it.

    First, if we've managed to come up with an ordering that costs 18.1 units already (our best ordering so far), and our current calculations only get through the first 9 filters and have already added up 18.2 units of cost, then there's no point permuting any of the remaining filters (nor measuring any further using that starting arrangement).

    Second, if my next permutation is going to place a filter with a cost of 5 and a selectivity of .9 in front of a filter with a cost of 4 and a selectivity of .8, then I would just be wasting time to do anything with that permutation. So this code does some efficient checks to prevent at least a lot of cases of putting one filter in front of another filter that it is completely worse than (neither cost nor selectivity improved). Well, assuming the filters start out sorted reasonably, anyway.

    This code starts out by naively sorting the filters by the product of cost and selectivity. That gets you pretty close to the ideal solution in the few cases I dreamed up so far. It also ensures that no filter starts out in front of a filter that is completely better than it, which is required for the second short-circuit above to be very effective.

    I don't have time right now to explain this code or even test it more than the little bit that I have. NextPermuteNum() in this code is just a modification of the one from Algorithm::Loops.

    The usage is to give a list of filter performance values on the command line. Each performance value is an integer cost followed by a decimal selectivity. So "4.9" indicates a filter that eliminates 10% of items (keeps 0.9) and costs 4 units (of CPU).

    So a sample run might look like:

    % filterOrder 1.9 1.8 2.9 2.8 3.9 3.8 4.9 4.8 30.1 20.2 10.3 1 .8, 1 .9, 2 .8, 2 .9, 3 .8, 3 .9, 30 .1, 10 .3, 4 .8, 4 .9, 20 .2, 0.002 19.00391 ( 0 1 2 3 4 5 6 7 8 9 10 ) 0.100 12.88756 ( 0 1 2 7 4 8 3 10 5 6 9 ) 9 1 .8, 1 .9, 2 .8, 10 .3, 3 .8, 4 .8, 2 .9, 20 .2, 3 .9, 30 .1, 4 .9, 3.750 seconds

    Which shows the first stab at a solution and the final solution were:

    1 .8, 1 .9, 2 .8, 2 .9, 3 .8, 3 .9, 30 .1, 10 .3, 4 .8, 4 .9, 20 .2 | | | \+3 | \+3 \+3 /-3 /-3 \+1 /-3 | | | | | | | /+3 | /+3 \-3 /+3 \-3 \-3 \-1 1 .8, 1 .9, 2 .8, 10 .3, 3 .8, 4 .8, 2 .9, 20 .2, 3 .9, 30 .1, 4 .9

    And it took only 0.1 seconds to find the optimal solution (but then took 3.75 seconds to verify that there was nothing better). An exhaustive search would have taken minutes without those short-circuits.

    It still isn't practical for optimally ordering 30 or 40 filters; the run time would depend greatly on how many filters can be declared "completely better than" how many others but could easily run into the "millennia" range for such large lists.

    It can probably be most useful for improving the weighting formula used for the initial sort once you have some typical numbers for your situation. $cost*$selectivity isn't a bad starting point, but I'm sure, especially given a limited pattern of values, one can quickly come up with something better with a bit of trial and error.

    If the sort gets the first several filters in the right order, then an optimal solution can be had very quickly. A good ordering can be had immediately and better orderings can pop out as you let the thing run.

    It may also be useful as something to further modify to get to something that can get even closer to the optimal total cost rather quickly. For example, if you identify a subset of the filters that are likely to be the best ones, then you can separately optimize just that subset first to get a better first part of the ordering.

    Anyway, here is the code (which is not pretty nor modular but appears to work reasonably well in the few test cases I played with).

    #!/usr/bin/perl -w use strict; use Time::HiRes qw< time >; $| = 1; my( @Costs, @Trims ); for( @ARGV ) { my( $cost, $trim ) = /^(\d+)(\.\d+)$/ or die "Invalid cost.trim: $ +_\n"; push @Costs, $cost; push @Trims, $trim; } my $Start = time(); my @order = sort { $Trims[$a]*$Costs[$a] <=> $Trims[$b]*$Costs[$b] } 0..$#Costs; @Trims = @Trims[@order]; @Costs = @Costs[@order]; print " $Costs[$_] $Trims[$_]," for 0..$#Costs; print $/; my @Win; cost( \@Costs, \@Trims ); print $/; print " $Costs[$_] $Trims[$_]," for @Win; printf "\n%.3f seconds\n", time() - $Start; sub NextPermuteNum(\@) { my( $vals )= @_; my $last= $#{$vals}; return !1 if $last < 1; while( 1 ) { # Find last item not in reverse-sorted order: my $i= $last-1; $i-- while 0 <= $i && $vals->[$i+1] <= $vals->[$i]; # If complete reverse sort, we are done! if( -1 == $i ) { # Reset to starting/sorted order: @$vals= reverse @$vals; return !1; } # Re-sort the reversely-sorted tail of the list: @{$vals}[$i+1..$last]= reverse @{$vals}[$i+1..$last] if $vals->[$last] < $vals->[$i+1]; # Find next item that will make us "greater": my $j= $i+1; $j++ while $vals->[$j] <= $vals->[$i]; do { # Swap: @{$vals}[$i,$j]= @{$vals}[$j,$i]; # If "greater" item isn't strictly worse than try it: return 1 if $Costs[$vals->[$i]] < $Costs[$vals->[$j]] || $Trims[$vals->[$i]] < $Trims[$vals->[$j]]; # Else, we picked a strictly worse one, so pick again $j++; } while( $j <= $last ); # Force us to move back to at least $i-1: @{$vals}[$i..$last] = sort { $b <=> $a } @{$vals}[$i..$last]; } } sub cost { my( $costs, $trims ) = @_; my @order = 0 .. $#$costs; my $best = 0; my $max = 0; do { my $size = 1; my $cost = 0; my $skip = @order; for my $i ( @order ) { $skip--; last if $best && $best < $cost; $cost += $size * $costs->[$i]; $size *= $trims->[$i]; } if( ! $best || $cost <= $best ) { printf "\r%.3f %.5f ( %s )", time() - $Start, $cost, "@order"; print $/ if ! $best; @Win = @order; $best = $cost; $max = 0; } elsif( 1 < $skip ) { if( $max < $skip ) { printf "%4d\b\b\b\b", $skip; $max = $skip; } @order[$#order-$skip+1..$#order] = sort { $b <=> $a } @order[$#order-$skip+1..$#order]; } } while( NextPermuteNum(@order) ); }

    - tye        

      Heh, I just had what might be a great idea on how to do the first sort.

      It is trivial to do a two-item optimization in your sort comparison routine. This is "cheating" because you can end up telling sort that you want $a < $b and $b < $c but $c < $a. On old versions of Perl, that might even cause a core dump. I think it is actually documented as safe on modern Perl, however.

      my @order = sort { $Costs[$a]+$Trims[$a]*$Costs[$b] <=> $Costs[$b]+$Trims[$b]*$Costs[ +$a] } 0..$#Costs; @Trims = @Trims[@order]; @Costs = @Costs[@order];

      It is that simple. It worked reasonable well in a couple of test cases. It even found an optimal solution on the first try for my prior example above (but took too long to declare success, probably indicating a bug).

      - tye        

        my @order = sort { $Costs[$a]+$Trims[$a]*$Costs[$b] <=> $Costs[$b]+$Trims[$b]*$Costs[ +$a] } 0..$#Costs;

        I'm pretty sure that this is the optimal solution, but I'm too lazy to write down the complete mathematical proof.

        The basic idea is that the terms represent the general cost difference of swapping the first and the second filter¹!

        So by sorting you get a solution which can't be improved by a pairwise swapping and any permutation can be represented by a sequence of pairwise swaps.

        Cheers Rolf

        ¹) see restructured formula for c in Re^2: Evolving a faster filter? (combinatorics) and you will notice that these are the first elements and the rest is fix.

        UPDATE:

        basic transformations lead to a simpler solution:

        my @order = sort { $Costs[$a]/(1-$Trims[$a]) <=> $Costs[$b]/(1-$Trims[$b]) } 0..$#Costs;

        which can be precalculated to save calculation time

        my @order = sort { $weight[$a] <=> $weight[$b] } 0..$#Costs;

        (the edge case of a division by zero must be handled as practically infinity)

        Perhaps you missed the significance of the timings:

        C:\test>1011850 -O=100000 Dict read at C:\test\1011850.pl line 96. objects loaded at C:\test\1011850.pl line 106. App created at C:\test\1011850.pl line 112. Ovid: (optimal ordering) returned 9193 objs in 4.447958 seconds Ovid: (pessimal ordering) returned 9193 objs in 7.236690 seconds RichardK: (optimal ordering) returned 9193 objs in 4.065038 seconds RichardK: (pessimal ordering) returned 9193 objs in 6.219226 seconds BUK: returned 9193 objs in 0.760572 seconds Filters run at C:\test\1011850.pl line 138.

        Running your OP algorithm against a set of 100 filters each carefully arrange to filter an increasing number of objects and ordered best to worst, and running two tests -- optimal and pessimal -- the difference between then is less that 50%. That is the best you can hope to achieve with your order-the-filters mechanism.

        However, by altering the way the filters operate upon the objects, you can achieve an order of magnitude saving.

        Or perhaps you've just decided how you want to tackle the problem. Regardless...


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Evolving a faster filter?
by LanX (Saint) on Jan 04, 2013 at 15:23 UTC
    Hi Curtis,

    I suppose you are not interested in some technical tips to speed up remove_unwanted because most time is consumed within the ->$filter()s?

    Like avoiding array-copies and faster method-calls?

    Well hard to tell with the infos given, but the filters could dynamically record their performance in previous runs and you sort them accordingly.

    Cheers Rolf

      > record their performance in previous runs and you sort them accordingly.

      OK some theoretical thoughts of how to sort them:

      be c[i] the average cost to run filter[i] on 1 object

      be r[i] the average ratio of remaining/checked objects after applying filter[i]

      be o the number of objects, c the costs so far

      so after applying a filter i

      o*=r[i] and c+=o*c[i]

      so with an chosen ordering 0,1,2 we get

      c = o*c[0]+ o*r[0]*c[1] + o*r[0]*r[1]*c[2] + ...

      after restructuring we get

      c = o * ( c[0] + r[0] * ( c[1] + r[1] * ( ... ) ) )

      I don't know if there is a direct solution for this optimization problem (I could ask some colleagues from uni, I'm pretty sure they know) but this equation is a good base for a "branch-and-bound"¹ - graph search algorithm:

      the minimal cost-factor after the first filter is  > c[0] + r[0] * c[min] with c[min] being the minimal cost after excluding filter[0].

      These are pretty good branch and bound criteria, which should lead very fast to an optimal solution w/o trying each faculty(@filters) permutation like tye did here.

      That means even if your recorded performance for each filter is fuzzy and unstable you can always adapt and recalculate on the fly, even if you have more complicated models with ranges of values for c and r.

      Cheers Rolf

      UPDATES:

      ¹) for examples of B&B applications see Possible pairings for first knockout round of UEFA champions league or Puzzle Time

Re: Evolving a faster filter?
by BrowserUk (Patriarch) on Jan 05, 2013 at 08:31 UTC
    It needs to run fast, fast, fast.

    Hm. Not a lot to go on. A sanitised peak at one or two of your filters would be nice.

    How does reducing 100,000 objects to 9193 by filtering them on each of 100 hundred fields in 3/4 second stack up?

    C:\test>1011850 -O=100000 Dict read at C:\test\1011850.pl line 96. objects loaded at C:\test\1011850.pl line 106. App created at C:\test\1011850.pl line 112. Ovid: (optimal ordering) returned 9193 objs in 4.447958 seconds Ovid: (pessimal ordering) returned 9193 objs in 7.236690 seconds RichardK: (optimal ordering) returned 9193 objs in 4.065038 seconds RichardK: (pessimal ordering) returned 9193 objs in 6.219226 seconds BUK: returned 9193 objs in 0.760572 seconds Filters run at C:\test\1011850.pl line 138.

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Evolving a faster filter?
by roboticus (Chancellor) on Jan 05, 2013 at 15:31 UTC

    Ovid:

    In the past, I've used a modified priority queue to hold function pointers (code was in C). As the code would run, it would keep the last "successful" function at the first entry in the queue, the rest of the queue sorted by the number of successes so far. So the functions were run in order of their success rate. It worked well, as frequently the last successful function would be successful on the next item in the list. In my case, all the functions had a similar runtime, so I didn't worry about figuring time into the equation. Also, my items to process tended to have similar items grouped together, so one function might be successful several tens of items in a row, so updating the queue wasn't particularly frequent.

    Since your functions have significantly different runtimes, I think I'd try a similar procedure, but rather order by success rate, I might try success rate divided by average runtime. If the average runtime is known ahead of time, then the code is pretty straightforward. If your runtime changes, you might try rebalancing the queue every X operations, and use the time spent for X operations to modify the weights of the first few entries.

    Finally, have you tried swapping your filters vs objects loops? If you have few filters and many objects, I'd guess that it might be faster to do it like:

    sub remove_unwanted { my ( $self, @objects ) = @; my @rv; OUTER: for my $curObj (@objects) { for my $predicate ($self->predicates) { next OUTER unless $self->$predicate($curObj); } push @rv, $curObj; } return @rv; }

    Of course, you'd have to rearrange a good deal of code if you were going to try it this way. I recognize your handle, so I suspect you've already evaluated the tradeoff between filtering a list of objects vs. evaluating single objects. I'm mostly mentioning this trick for others with similar problems to solve.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: Evolving a faster filter?
by BrowserUk (Patriarch) on Jan 05, 2013 at 17:25 UTC

    As an aside, thousands of objects do not normally come into being instantaneously or en masse; but rather come into being over time.

    As your filters are static for any given run of the program, why not pass them through the filters at instantiation time and set a flag? (And perhaps re-run it any time one of the filtered attributes changes.)

    Then when you need the filtered subset, you only need a single boolean test for each object.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      That sounds like a nice idea, but though the filters are roughly static, their behavior changes quite a bit for the data. For one run, I might want to exclude all objects of property X, with a value of Y below a particular threshold and which wasn't previously selected in the past hour. For the next run, X might be irrelevant (though the filter still gets run because we don't know that until the filter is run), Y might have a different threshold and we might not care about when the object was selected. So multiple filter runs with the same set of objects have, literally, millions of possible different outcomes.

        For one run, I might want to exclude all objects of property X, with a value of Y below a particular threshold and which wasn't previously selected in the past hour. ... So multiple filter runs with the same set of objects have, literally, millions of possible different outcomes.

        Hm. Sounds like one of these 'community driven' "We also recommend..." things that the world+dog have added to their sites recently.

        But still it makes me wonder whether you cannot distribute the load somehow.

        That is, is it really necessary to make the entire decision process actively every time, by running all the filters exactly at the instant of need?

        Or, could you re-run each of the filters (say) once per hour with the then current dataset and only amalgamate the results and make your final selection at that need point.

        You might (for example), run each filter individually and store its result in the form of a bitstring where each position in the bitstring represents a single object in the set. Then, at the time-of-need, you combine (bitwise-AND) the latest individual bitstrings from all the filters to produce the final selection.

        With 100,000 objects, a single filter is represented by a 25k scalar. Times (say) 100 filters and it requires 2.5MB to store the current and ongoing filter set.

        Combining those 100x 100,000 filter sets is very fast:

        use Math::Random::MT qw[ rand ];; $filters[ $_ ] = pack 'Q*', map int( 2**64 * rand() ), 0 .. 1562 for 0 + .. 99;; say time(); $mask = chr(0xff) x 12504; $mask &= $filters[ $_ ] for 0 . +. 99; say time();; 1357485694.21419 1357485694.21907

        Less than 5 milliseconds!

        Assuming your application could live with the filters being run once every time period (say; once per hour or half an hour or whatever works for your workloads), rather than in-full for every time-of-need?

        (NOTE: this is not the method used in my other post which does the full 100,000 objects through 100 filters in 0.76 seconds, but without a feel for how long your current runs take, there is no way to assess how realistic that would be?)


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        So there's no way to tell how many objects a filter will take out until the filter was run?

Re: Evolving a faster filter?
by sundialsvc4 (Abbot) on Jan 04, 2013 at 15:58 UTC

    What is the cost of each filter?   In other words, specifically, how much input/output does a filter have to do, in order to make its determination?   And, how much information-in-common do the various related filters share?   And how many other servers, etc., are involved in the process of satisfying each request?

    I am taking for granted that each server has enough RAM to allow all n of the objects being considered to be resident in memory at the same time such that no page-fault I/O will be required during the process of handling them.   But I/O is and always will be the primary determinant of execution speed apart from raw CPU horsepower (and of course locking, of which there ought to be no requirement), and I/O can sometimes be well-hid.   Especially significant would be if two filters inadvertently perform the same I/O that a predecessor has already done:   everything should be explicitly cached.

    I am also presuming that each one of the servers makes its own decision and does not share information with the others.   But, having said that, I know that it probably is not so:   there probably is some kind of sharing and/or locking and/or distribution of results among the parties, and that necessarily means some amount of waiting, one for the other.   (And the more parties there are to wait, the greater is the chance that any one of them will wait.)

    Setting the matter of contention and sharing aside now ...   If the criteria required to predict the performance of a particular filter changes over time, then perhaps you could maintain some kind of dynamic scoring over the filters’s past performance over the last few minutes, say in terms of number of objects eliminated.   The hypothesis is that past performance is the best predictor of future results.   Apply the filters in descending order of score, such that the filters that have lately shown themselves to be most effective are tried first.   If you retain the latest x most-recent score values in a queue and rank the filter by the current average of those scores, you will make it trend.

    A final idle thought is then to apply a random-shuffle algorithm to the sorted list, doing, not a complete shuffle, but rather randomly exchanging the positions of some y elements in the ranking each time.   This arbitrarily puts some players on the bench and takes some players off the bench, randomly accepting the chance of a penalty in exchange for the chance that prevailing data-conditions have subtly and recently evolved, such that one player who might otherwise be overlooked just might be the one to score the next touchdown.

    I am certain that this thread will attract the attention of BrowserUK, who is well-known about these parts to be especially expert in high-performance algorithms in high-volume situations.   I especially look forward to his insights on this subject.

Re: Evolving a faster filter?
by sundialsvc4 (Abbot) on Jan 06, 2013 at 14:26 UTC

    In the “further food for thought” department, it occurs to me that this is not entirely a pure-combinatorics problem.   One filter might appear to be “better” in one situation than it would have been in another, i.e. because another filter did precede it, and that filter did remove elements perhaps more-efficiently than it would have done.   A key factor of consideration might be ... and only you could have the answer to this ... is the most important goal that the result-set be reduced to its minimum practicable size, or that the filter-set should reduce the result-set as far as it can manage to do within a strictly limited amount of CPU time?   How predictable and consistent are these “costs,” and upon what do they depend?

    If you can by any means see your way to making this a true optimization problem then by all means do that ... but I guess that you can’t because, if you could, you probably would not have a choice of more-than-one filter.   It is quite an interesting sort of problem and my guess is that the solution of choice will be a heuristic rather than an algorithm much less a proof.   You’ll come up with a self-adapting method that produces good-enough results and that does so consistently, perhaps never knowing whether a few milliseconds more could have been shaved at any particular instant.