Pure regex Hamiltonian Circuit solution

I'd seen Abigail do nasty things with regexes before, but never really understood them at all. However, in the recent N-Queens solver, it was a pure regex (backrefs only), there were no (?{}) constructs, and I was finally able to understand how Abigail does it..

Then, on a regex kick, I discovered Abigail's pure regex 3-SAT reduction. If you haven't already seen it, it's the coolest couple of lines on the planet. With all this regex excitement, I decided to try my hand at solving some NP-complete problem(s) with pure regexes.

I tried a handful of different NP-complete problems, but after a few emails with MJD, I understood why these attempts went wrong. I kept getting hung up on the string and regex size being exponential on the size of input (not meaning my solution was incorrect, but invalidating its use as an NP-completeness reduction proof). Finally, I think I may have gotten it right with Hamiltonian Circuits. Abigail has done this before using extended regexes, but claimed to be stuck on a pure regex solution that wasn't exponential.

So without further ado, here's my solution. Given a (simple, undirected) graph with E edges and V vertices, it finds a Hamiltonian Circuit (a cycle that visits every vertex exactly once, starting and ending in the same place). The size of the string it creates is bounded by O(V^4), and the size of the regex it creates is bounded by O(V^2).

my @E = ([1,3],[1,5],[2,3],[2,4],[2,5],[4,5]);
my $V = 5;
my $verbose = 1;

my @all_edges = map { my $x = $_; map { [$x, $_] } $x+1 .. $V } 1 .. $
+V-1;

my $string = (join(' ', 1 .. $V) . "\n") x $V
           . "\n"
           . (join(' ', map { join "-", @$_ } @all_edges ) . "\n")
               x @all_edges
           . "\n"
           . (join(' ', map { join "-", @$_ } @E ) . "\n") x $V;

my $regex = "^ "
          . ".* \\b (\\d+) \\b .* \\n\n" x $V
          . "\\n\n"
          . join("", map { my ($x, $y) = @$_;
                           ".* \\b (?: \\$x-\\$y | \\$y-\\$x ) \\b .* 
+\\n\n"
                         } @all_edges)
          . "\\n\n"
          . join("", map { my ($x, $y) = ($_, $_+1);
                           ".* \\b (?: \\$x-\\$y | \\$y-\\$x ) \\b .* 
+\\n\n"
                         } 1 .. ($V-1))
          . ".* \\b (?: \\$V-\\1 | \\1-\\$V ) \\b .* \\n \$\n";

print "'$string' =~ /\n$regex\n/x\n" if $verbose;

if (my @c = $string =~ /$regex/x) {
  local $" = " -> ";
  print "Hamiltonian circuit: [ @c -> $1 ]\n";
} else {
  print "No Hamiltonian circuit\n";
}
[download]

Now the dirty details.. Here's the value of $string and $regex for the example graph included:

$string = q[
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5

1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5

1-3 1-5 2-3 2-4 2-5 4-5
1-3 1-5 2-3 2-4 2-5 4-5
1-3 1-5 2-3 2-4 2-5 4-5
1-3 1-5 2-3 2-4 2-5 4-5
1-3 1-5 2-3 2-4 2-5 4-5
];

$regex = q[^
.* \b (\d+) \b .* \n
.* \b (\d+) \b .* \n
.* \b (\d+) \b .* \n
.* \b (\d+) \b .* \n
.* \b (\d+) \b .* \n
\n
.* \b (?: \1-\2 | \2-\1 ) \b .* \n
.* \b (?: \1-\3 | \3-\1 ) \b .* \n
.* \b (?: \1-\4 | \4-\1 ) \b .* \n
.* \b (?: \1-\5 | \5-\1 ) \b .* \n
.* \b (?: \2-\3 | \3-\2 ) \b .* \n
.* \b (?: \2-\4 | \4-\2 ) \b .* \n
.* \b (?: \2-\5 | \5-\2 ) \b .* \n
.* \b (?: \3-\4 | \4-\3 ) \b .* \n
.* \b (?: \3-\5 | \5-\3 ) \b .* \n
.* \b (?: \4-\5 | \5-\4 ) \b .* \n
\n
.* \b (?: \1-\2 | \2-\1 ) \b .* \n
.* \b (?: \2-\3 | \3-\2 ) \b .* \n
.* \b (?: \3-\4 | \4-\3 ) \b .* \n
.* \b (?: \4-\5 | \5-\4 ) \b .* \n
.* \b (?: \5-\1 | \1-\5 ) \b .* \n $
];

__OUTPUT__
Hamiltonian circuit: [ 5 -> 4 -> 2 -> 3 -> 1 -> 5 ]
[download]

Here's how it works:

In the first "paragraph", $1 through $5 each get a vertex assigned to them. This paragraph of $string is bounded by O(V^2), and $regex by O(V).
The second "paragraph" is the ugly bit. We have to make sure all the vertices chosen are different. This could have been done by originally choosing $1 through $5 to be any permutation from an exhaustive list of permutations, but such a list would have been exponential in size.

The way I checked is by picking any 5 vertices, then ensuring that they are all pairwise distinct. In $string, we have a repeated list of all "valid" (distinct) vertex pairs. In $regex, we make sure that every pair of two vertices in {$1,..,$5} shows up in that list.
This isn't exponential, because there are V(V-1)/2 "valid" (distinct) pairs, and V(V-1)/2 pairs to verify. So this paragraph of $string is bounded by O(V^4), and $regex by O(V^2).
The third "paragraph" is where we verify that our distinct vertex set has edges from $1 to $2 ... to $5 and then back to $1. This paragraph of $string is bounded by O(E*V) = O(V^3), and $regex by O(V).

If all these conditions matched, then the regex matches, and $1 through $5 must be the vertices of a Hamiltonian Circuit.

There you have it. Your feedback is welcome. I really hope I haven't made any mistakes in my analysis. This code isn't one-tenth as beautiful and elegant as Abigail's 3-SAT reduction, and perhaps it could be improved upon. But hey, we all have to start somewhere ;)

Update: modified code so that $string is a little clearer to read. Instead of comma-separating the edge and vertex lists, they are separated by spaces. $regex is modified accordingly, matching on \b where appropriate, instead of a comma.

blokhead

Comment on Pure regex Hamiltonian Circuit solution Select or Download Code

Replies are listed 'Best First'.
Re: Pure regex Hamiltonian Circuit solution by Abigail-II (Bishop) on Oct 11, 2003 at 02:05 UTC
Nice. You beat me to it. I was already pretty sure earlier this week that Hamiltonian Circuits/Paths could be done with pure regexes as well, and just tonight I was making some notes on how to do it. It basically came down to the same principles you are using - although I was thinking of mixing the picking of the vertices with the testing for unique picks and valid paths (just to gain speed by rejecting earlier). Abigail	[reply]
Re: Pure regex Hamiltonian Circuit solution by BrowserUk (Patriarch) on Oct 11, 2003 at 02:26 UTC
++blokhead. My hat is off to you. I'd really like to see two things. The same algorithm implemented using normal perl. Iterative, recursive, a mixture of the two. Whatever, so long as it solved the same problem (and arrived at the same answer:). A benchmark of a few, well chosen, 'typical cases' using your pure-regex, Abigail's extended regex, and the pure perl solutions. Throw in a CPAN Perl-over-'generic paths'-C-library, solution and we'd have a very interesting shoot-out:) Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail	[reply]
Re: Pure regex Hamiltonian Circuit solution by blokhead (Monsignor) on Oct 13, 2003 at 05:50 UTC
Here's an updated version using lookaheads to greatly shorten things. Matching time shouldn't change much. my @E = ([1,3],[1,5],[2,3],[2,4],[2,5],[4,5]); my $V = 5; my $verbose = 1; my @all_edges = map { my $x = $_; map { [$x, $_] } $x+1 .. $V } 1 .. $ +V-1; my $string = (join(' ', 1 .. $V) . "\n") x $V . join(' ', map { join "-", @$_ } @all_edges ) . "\n" . join(' ', map { join "-", @$_ } @E ); my $regex = "^\n" . ".* \\b (\\d+) \\b .* \\n\n" x $V . join("", map { my ($x, $y) = @$_; "(?= .* \\b (?: \\$x-\\$y \| \\$y-\\$x ) \\b + ) \n" } @all_edges) . ".\\n\n" . join("", map { my ($x, $y) = ($_, $_+1); "(?= . \\b (?: \\$x-\\$y \| \\$y-\\$x ) \\b + )\n" } 1 .. ($V-1)) . "(?= .* \\b (?: \\$V-\\1 \| \\1-\\$V ) \\b )\n"; print "'$string' =~ /\n$regex\n/x\n" if $verbose; if (my @c = $string =~ /$regex/x) { local $" = " -> "; print "Hamiltonian circuit: [ @c -> $1 ]\n"; } else { print "No Hamiltonian circuit\n"; } __END__ $string = q[ 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5 1-3 1-5 2-3 2-4 2-5 4-5 ]; $regex = q[ ^ .* \b (\d+) \b .* \n .* \b (\d+) \b .* \n .* \b (\d+) \b .* \n .* \b (\d+) \b .* \n .* \b (\d+) \b .* \n (?= .* \b (?: \1-\2 \| \2-\1 ) \b ) (?= .* \b (?: \1-\3 \| \3-\1 ) \b ) (?= .* \b (?: \1-\4 \| \4-\1 ) \b ) (?= .* \b (?: \1-\5 \| \5-\1 ) \b ) (?= .* \b (?: \2-\3 \| \3-\2 ) \b ) (?= .* \b (?: \2-\4 \| \4-\2 ) \b ) (?= .* \b (?: \2-\5 \| \5-\2 ) \b ) (?= .* \b (?: \3-\4 \| \4-\3 ) \b ) (?= .* \b (?: \3-\5 \| \5-\3 ) \b ) (?= .* \b (?: \4-\5 \| \5-\4 ) \b ) .\n (?= . \b (?: \1-\2 \| \2-\1 ) \b ) (?= .* \b (?: \2-\3 \| \3-\2 ) \b ) (?= .* \b (?: \3-\4 \| \4-\3 ) \b ) (?= .* \b (?: \4-\5 \| \5-\4 ) \b ) (?= .* \b (?: \5-\1 \| \1-\5 ) \b ) ]; [download] There's no more need for repeating the listing of `@all_edges` so many times. With lookaheads, it only needs to be there once. Same with the listing of `@E`. This reduces `$string` to O(V^2). Because backtracking doesn't happen within lookaheads, I couldn't use lookaheads to select the captured vertices. So the listing of vertices 1 to `$V` still has to be there `$V` times. blokhead	[reply] [d/l]

Back to Meditations