comment on

Note: Significantly updated, 2008-02-01, 12:30am ish

Mathematically this is a bit different from a packing problem. A packing problem is a least upper/greatest lower bound problem. This problem centers on distributions and deviations. It might be better to think of it as a matching problem.

To keep the bucket closest to the mean, whenever you give a bucket something with the price above the mean you also have to give it something below the mean. Originally, I thought this would be best accomplished with a two phase algorithm: first get rid of the items that let every bucket have the same average as the overall sample mean. Next get rid of the oddballs. That algorithm is described here:

I think if I were solving this problem, I might try something like this, for a start:

Phase I - for all items such that quantity > number of buckets

pick the item at the mean (if it exists)
allocate the same number of that item to each bucket and put aside the remainder
pick the next pair above and below the mean. Calculate the average deviation of the values. If it exceeds the threshold above the mean, pick the next value below the mean. If it exceeds the threshold below the mean, pick the next value above the mean. Repeat until you find a combination of values above and below the mean that fall within the threshold.
If your goal is minimizing distance from the mean rather than keeping that distance within a threshold, continue until you run out of values and pick the combination that had a minimal deviation.
allocate the same number of each combination to each bucket and put aside any oddballs.
repeat until you run out of things to allocate evenly to the buckets

At this point all should have exactly the same average and exactly the same deviation from the mean. If there were no left overs, then this would represent the best possible allocation. But of course, there will be left overs, so the next step is to get rid of the oddballs.

Phase II - get rid of odd balls

This is identical to phase I except that instead of allocating each combination to all buckets at once, each sub-threshold combination is given to only one bucket. So if choosing one value above and below the mean is below the threshold, then two items are allocated to bucket X and only bucket X. Then the whole process of finding a subthreshold combination is repeated for bucket Y, and so on until each bucket has been given a combination.

On the off chance that there is some cyclic pattern to the distribution of deviations, it is probably a good idea to use a random order when filling the buckets. So for the first round one might fill buckets 3-1-2. Next 2-1-3. Next 1-2-3. etc.

The other major risk in this algorithm is the "odd man out problem" - it may be that the deviations are distributed in such a way that the last bucket in the last round ends up with an average deviation that is really off the charts.

There are probably ways around this that involve calculating deviations of deviations or surveying the distribution of deviations - this kind of problem is more likely to happen in a badly skewed distribution. But this is beginning to spin my brain, so I'll stop.

Best, beth

Update: Whoops! Forget the second part of the problem: A gets 19, B gets 30 and so on. For that, you could simply start skipping a bucket when it reaches its max number of items. So if you start with 5 buckets and one gets filled, then from then on you work only with the four remaining buckets.

Note: the order in which people drop out can't be randomized so the person with the highest number of items is also at the greatest risk of ending up with the "odd man out" deviations. If, by any chance, you are allocating shares bought at different times for a group of investors this may or may not be "ok".

Translated into financial terms, he who has the greatest chance of being the holder of the "odd man out" has the greatest risk - i.e. the greatest liklihood that he or she has some significant upward or downward unpredictability in his/her outcome. Risk can have a monetary value and/or contractual constraints. If that is the case, you may need to figure out a way to determine the probability that someone will be the odd man out or else eliminate the issue by randomizing who is the odd man out, but as I said, that part of the problem is making my head spin...

Update again: reworded above paragraph.

I've since reconsidered my algorithm, see note below. Best, beth

In reply to Re: Average Price Algorithm by ELISHEVA
in thread Average Price Algorithm by camelcom

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Welcome to the Monastery
	PerlMonks