The greedy change-making problem using regexes

Here's another problem related to money systems and making change that I've found very interesting.

In the American system of coinage (1c, 5c, 10c, 25c), we can make change for any amount of money with the fewest number of coins possible by using a greedy algorithm: starting with the largest coin amount, take as many as you can without exceeding the amount, then move to the next largest amount, and so on. In fact, proving the optimality of this algorithm (with these coin amounts) is often a first example or exercise in introductory algorithms classes.

However, the fact that this algorithm is optimal is a property of the coin amounts, not of the algorithm in general. To see a case where the greedy algorithm is suboptimal, consider the slight modification to the American system where we replace the 5c piece with a 6c piece. Now to make change 12c, the greedy algorithm uses three coins (10+1+1), but 12c can be made using only two coins (6+6).

This suggests the obvious algorithmic decision problem:

Given a set of coin denominations, does the greedy change-making algorithm use the fewest possible number of coins on all amounts?

This is known in the algorithms community as the (greedy) change-making problem. We can assume that a 1-unit denomination is included, otherwise we can't represent all amounts. We can also assume that at least 3 denominations are included, otherwise the greedy algorithm is always optimal. Other than these simple cases, this is a lot harder than it might first appear. In fact, this problem as stated above is NP-hard.

About all that is known about the mathematical structure of this problem is that if c₁ < c₂ < ... < c_m are the coin denominations, and the greedy algorithm is not optimal, then there is a counterexample (an amount for which the greedy algorithm gives more coins than are necessary) between (c₃+2) and (c_m+c_m-1-1), inclusive. If you don't believe me, you can read all about it in this paper.

So given a set of coins, all we have to do to see if it is optimal is see if any amount in that range is a counterexample. How on earth do we do that? There are a lot of ways to make change for a given amount, how can we be sure what is the fewest number of coins necessary? As with my previous post about change-making, many kinds of problems on money systems can be formulated quite naturally in terms of unary finite automata, making regexes a good way to check solutions. So I went about building some regex machinery for this problem. What follows is an account of my journey into backtracking and exploitation of the regex engine...

First here's the code. Keep in mind that this is purely regex fun for the sake of regex fun. This solution is very slow, and a solution dealing natively with integers (i.e, manipulating the integer N instead of a string of N characters) will most certainly be faster. However, exploiting the backtracking mechanism allows us to express this in a more logic/rules-based manner. For hard brute-search problems, this is a nice convenience. Additionally, I am well aware that there is still room for improvement in the regex that is generated..

use re 'eval';
use Test::More 'no_plan';

sub greedy_is_optimal {
    my @coins     = @_;
    my $greedy    = join "", map { "( 1{$_} (?{ \$^R+1 }) )* (?!1{$_})
+" }
                             reverse @coins;
    my $one_coin  = join "|", map { "1{$_}" } @coins;

    for ( $coins[2]+2 .. $coins[-1]+$coins[-2]-1 ) {
        local $x;
        (1 x $_) =~ /  ^ $greedy (?{ $x = $^R }) x
                     | ^ ( ($one_coin) (?{ $^R+1 }) )*
                         (?(?{ $^R < $x }) $ | x )    /x
            and return 0;
    }
    return 1;
}

is greedy_is_optimal(1,5,10,25), 1;
is greedy_is_optimal(1,6,10,25), 0;
is greedy_is_optimal(1,5,7),     0;
[download]

Let's look at what the heck is going on with this insane regex. Here's the (reformatted) regex when the coins are (1,5,7). Remember we are trying to match a string of all 1's.

    ^ ( 1{7} (?{ $^R+1 }) )* (?!1{7})
      ( 1{5} (?{ $^R+1 }) )* (?!1{5})
      ( 1{1} (?{ $^R+1 }) )* (?!1{1})
      (?{ $x = $^R }) x
  | ^ ( (1{1}|1{5}|1{7}) (?{ $^R+1 }) )*
      (?(?{ $^R < $x }) $ | x )
[download]

Notice that the whole regex is one large alternation. Let's look at the first alternation first, since that's what the regex engine will do: The first three lines accomplish a greedy change-making procedure -- here's how. Ignoring the (?{code}) blocks, we simply have

 ^ (1{7})* (?!1{7}) (1{5})* (?!1{5}) (1{1})* (?!1{1})
[download]

This says, match characters 7 at a time, as many as possible (i.e, until there aren't 7 more characters to be had). Then match them 5 at a time until there aren't 5 left, etc. That's precisely the greedy algorithm. Notice that this pattern can only match (any string of 1s) in one way.

Now, what were the (?{ $^R+1 }) things doing? Well, within a regex, $^R returns the value of the last-executed (?{code}) block. It is also automatically localized properly so that backtracking "rolls back" the values. So a bunch of code blocks that return $^R+1 simply accomplish setting $^R to the number of code-blocks that were executed (along this matching path). Since we have such a (?{code})-block each time we use a coin, when we reach the end of these three lines, the value of $^R must be the number of coins used by the greedy algorithm.

Finally, the fourth line of the regex says

(?{ $x = $^R }) x
[download]

This says to set $x to $^R (the number of coins used by the greedy algorithm), then try to match a literal "x" character .. There are no x's in this string, so it must fail. But we did not localize $x, so even when we backtrack past the code-block, the value of $x remains. As we mentioned before, these first lines can only match in one way, so we don't have to worry about getting possibly multiple values for $x. So now, these entire 4 lines must fail (with the side-effect of setting $x), so the regex will try to match the second alternation. It looks like this:

 ^ ( (1{1}|1{5}|1{7}) (?{ $^R+1 }) )* 
   (?(?{ $^R < $x }) $ | x )
[download]

The first line repeatedly chooses a coin (either 1, 5, or 7) and uses the $^R trick to count how many are used (remember that we have backtracked all the way to the beginning to try to match this alternation, so $^R restarts at zero). The second line is a regex conditional. It says if this value of $^R is less than the value we stored in $x, match the end of the string, otherwise try to match a literal "x" character. Matching a literal "x" isn't possible, so we will backtrack and try to match again, with a different combination of "coins."

All in all, if there is a way to match all the characters in the string using less than $x coins, then the conditional $^R < $x can eventually succeed, and we will match the end of the string and the entire regex will succeed. Otherwise, every time we get to the conditional at the end of the input, we can never get $^R < $x and the regex fails. This means that the greedy algorithm for making change gave us the fewest number of coins possible.

Final Thoughts: I think this solution is very slick, but slow. How would you have approached the problem? Also, how would a solution that just used the backtracking of the regex engine without string-matching (i.e, just localized integer manipulation) stack up?

blokhead

Back to Meditations