For one run, I might want to exclude all objects of property X, with a value of Y below a particular threshold and which wasn't previously selected in the past hour. ... So multiple filter runs with the same set of objects have, literally, millions of possible different outcomes.
Hm. Sounds like one of these 'community driven' "We also recommend..." things that the world+dog have added to their sites recently.
But still it makes me wonder whether you cannot distribute the load somehow.
That is, is it really necessary to make the entire decision process actively every time, by running all the filters exactly at the instant of need?
Or, could you re-run each of the filters (say) once per hour with the then current dataset and only amalgamate the results and make your final selection at that need point.
You might (for example), run each filter individually and store its result in the form of a bitstring where each position in the bitstring represents a single object in the set. Then, at the time-of-need, you combine (bitwise-AND) the latest individual bitstrings from all the filters to produce the final selection.
With 100,000 objects, a single filter is represented by a 25k scalar. Times (say) 100 filters and it requires 2.5MB to store the current and ongoing filter set.
Combining those 100x 100,000 filter sets is very fast:
use Math::Random::MT qw[ rand ];; $filters[ $_ ] = pack 'Q*', map int( 2**64 * rand() ), 0 .. 1562 for 0 + .. 99;; say time(); $mask = chr(0xff) x 12504; $mask &= $filters[ $_ ] for 0 . +. 99; say time();; 1357485694.21419 1357485694.21907
Less than 5 milliseconds!
Assuming your application could live with the filters being run once every time period (say; once per hour or half an hour or whatever works for your workloads), rather than in-full for every time-of-need?
(NOTE: this is not the method used in my other post which does the full 100,000 objects through 100 filters in 0.76 seconds, but without a feel for how long your current runs take, there is no way to assess how realistic that would be?)