I'd seen Abigail do nasty things with regexes before, but never really understood them at all. However, in the recent NQueens solver, it was a pure regex (backrefs only), there were no (?{}) constructs, and I was finally able to understand how Abigail does it..
Then, on a regex kick, I discovered Abigail's pure regex 3SAT reduction. If you haven't already seen it, it's the coolest couple of lines on the planet. With all this regex excitement, I decided to try my hand at solving some NPcomplete problem(s) with pure regexes.
I tried a handful of different NPcomplete problems, but after a few emails with MJD, I understood why these attempts went wrong. I kept getting hung up on the string and regex size being exponential on the size of input (not meaning my solution was incorrect, but invalidating its use as an NPcompleteness reduction proof). Finally, I think I may have gotten it right with Hamiltonian Circuits. Abigail has done this before using extended regexes, but claimed to be stuck on a pure regex solution that wasn't exponential.
So without further ado, here's my solution. Given a (simple, undirected) graph with E edges and V vertices, it finds a Hamiltonian Circuit (a cycle that visits every vertex exactly once, starting and ending in the same place). The size of the string it creates is bounded by O(V^4), and the size of the regex it creates is bounded by O(V^2).
my @E = ([1,3],[1,5],[2,3],[2,4],[2,5],[4,5]);
my $V = 5;
my $verbose = 1;
my @all_edges = map { my $x = $_; map { [$x, $_] } $x+1 .. $V } 1 .. $
+V1;
my $string = (join(' ', 1 .. $V) . "\n") x $V
. "\n"
. (join(' ', map { join "", @$_ } @all_edges ) . "\n")
x @all_edges
. "\n"
. (join(' ', map { join "", @$_ } @E ) . "\n") x $V;
my $regex = "^ "
. ".* \\b (\\d+) \\b .* \\n\n" x $V
. "\\n\n"
. join("", map { my ($x, $y) = @$_;
".* \\b (?: \\$x\\$y  \\$y\\$x ) \\b .*
+\\n\n"
} @all_edges)
. "\\n\n"
. join("", map { my ($x, $y) = ($_, $_+1);
".* \\b (?: \\$x\\$y  \\$y\\$x ) \\b .*
+\\n\n"
} 1 .. ($V1))
. ".* \\b (?: \\$V\\1  \\1\\$V ) \\b .* \\n \$\n";
print "'$string' =~ /\n$regex\n/x\n" if $verbose;
if (my @c = $string =~ /$regex/x) {
local $" = " > ";
print "Hamiltonian circuit: [ @c > $1 ]\n";
} else {
print "No Hamiltonian circuit\n";
}
Now the dirty details.. Here's the value of $string and $regex for the example graph included:
$string = q[
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
12 13 14 15 23 24 25 34 35 45
12 13 14 15 23 24 25 34 35 45
12 13 14 15 23 24 25 34 35 45
12 13 14 15 23 24 25 34 35 45
12 13 14 15 23 24 25 34 35 45
12 13 14 15 23 24 25 34 35 45
12 13 14 15 23 24 25 34 35 45
12 13 14 15 23 24 25 34 35 45
12 13 14 15 23 24 25 34 35 45
12 13 14 15 23 24 25 34 35 45
13 15 23 24 25 45
13 15 23 24 25 45
13 15 23 24 25 45
13 15 23 24 25 45
13 15 23 24 25 45
];
$regex = q[^
.* \b (\d+) \b .* \n
.* \b (\d+) \b .* \n
.* \b (\d+) \b .* \n
.* \b (\d+) \b .* \n
.* \b (\d+) \b .* \n
\n
.* \b (?: \1\2  \2\1 ) \b .* \n
.* \b (?: \1\3  \3\1 ) \b .* \n
.* \b (?: \1\4  \4\1 ) \b .* \n
.* \b (?: \1\5  \5\1 ) \b .* \n
.* \b (?: \2\3  \3\2 ) \b .* \n
.* \b (?: \2\4  \4\2 ) \b .* \n
.* \b (?: \2\5  \5\2 ) \b .* \n
.* \b (?: \3\4  \4\3 ) \b .* \n
.* \b (?: \3\5  \5\3 ) \b .* \n
.* \b (?: \4\5  \5\4 ) \b .* \n
\n
.* \b (?: \1\2  \2\1 ) \b .* \n
.* \b (?: \2\3  \3\2 ) \b .* \n
.* \b (?: \3\4  \4\3 ) \b .* \n
.* \b (?: \4\5  \5\4 ) \b .* \n
.* \b (?: \5\1  \1\5 ) \b .* \n $
];
__OUTPUT__
Hamiltonian circuit: [ 5 > 4 > 2 > 3 > 1 > 5 ]
Here's how it works:
 In the first "paragraph", $1 through $5 each get a vertex assigned to them. This paragraph of $string is bounded by O(V^2), and $regex by O(V).
 The second "paragraph" is the ugly bit. We have to make sure all the vertices chosen are different. This could have been done by originally choosing $1 through $5 to be any permutation from an exhaustive list of permutations, but such a list would have been exponential in size.
The way I checked is by picking any 5 vertices, then ensuring that they are all pairwise distinct. In $string, we have a repeated list of all "valid" (distinct) vertex pairs. In $regex, we make sure that every pair of two vertices in {$1,..,$5} shows up in that list.
This isn't exponential, because there are V(V1)/2 "valid" (distinct) pairs, and V(V1)/2 pairs to verify. So this paragraph of $string is bounded by O(V^4), and $regex by O(V^2).
 The third "paragraph" is where we verify that our distinct vertex set has edges from $1 to $2 ... to $5 and then back to $1. This paragraph of $string is bounded by O(E*V) = O(V^3), and $regex by O(V).
If all these conditions matched, then the regex matches, and $1 through $5 must be the vertices of a Hamiltonian Circuit.
There you have it. Your feedback is welcome. I really hope I haven't made any mistakes in my analysis. This code isn't onetenth as beautiful and elegant as Abigail's 3SAT reduction, and perhaps it could be improved upon. But hey, we all have to start somewhere ;)
Update: modified code so that $string is a little clearer to read. Instead of commaseparating the edge and vertex lists, they are separated by spaces. $regex is modified accordingly, matching on \b where appropriate, instead of a comma.
Re: Pure regex Hamiltonian Circuit solution
by AbigailII (Bishop) on Oct 11, 2003 at 02:05 UTC

Nice. You beat me to it. I was already pretty sure earlier
this week that Hamiltonian Circuits/Paths could be done with
pure regexes as well, and just tonight I was making some notes
on how to do it. It basically came down to the same principles
you are using  although I was thinking of mixing the picking
of the vertices with the testing for unique picks and valid paths (just to gain speed by rejecting earlier).
Abigail  [reply] 
Re: Pure regex Hamiltonian Circuit solution
by BrowserUk (Pope) on Oct 11, 2003 at 02:26 UTC

++blokhead. My hat is off to you.
I'd really like to see two things.
 The same algorithm implemented using normal perl. Iterative, recursive, a mixture of the two. Whatever, so long as it solved the same problem (and arrived at the same answer:).
 A benchmark of a few, well chosen, 'typical cases' using your pureregex, Abigail's extended regex, and the pure perl solutions.
Throw in a CPAN Perlover'generic paths'Clibrary, solution and we'd have a very interesting shootout:)
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." David Dunham
"Think for yourself!"  Abigail
 [reply] 
Re: Pure regex Hamiltonian Circuit solution
by blokhead (Monsignor) on Oct 13, 2003 at 05:50 UTC

Here's an updated version using lookaheads to greatly shorten things. Matching time shouldn't change much.
my @E = ([1,3],[1,5],[2,3],[2,4],[2,5],[4,5]);
my $V = 5;
my $verbose = 1;
my @all_edges = map { my $x = $_; map { [$x, $_] } $x+1 .. $V } 1 .. $
+V1;
my $string = (join(' ', 1 .. $V) . "\n") x $V
. join(' ', map { join "", @$_ } @all_edges ) . "\n"
. join(' ', map { join "", @$_ } @E );
my $regex = "^\n"
. ".* \\b (\\d+) \\b .* \\n\n" x $V
. join("", map { my ($x, $y) = @$_;
"(?= .* \\b (?: \\$x\\$y  \\$y\\$x ) \\b
+ ) \n"
} @all_edges) . ".*\\n\n"
. join("", map { my ($x, $y) = ($_, $_+1);
"(?= .* \\b (?: \\$x\\$y  \\$y\\$x ) \\b
+ )\n"
} 1 .. ($V1))
. "(?= .* \\b (?: \\$V\\1  \\1\\$V ) \\b )\n";
print "'$string' =~ /\n$regex\n/x\n" if $verbose;
if (my @c = $string =~ /$regex/x) {
local $" = " > ";
print "Hamiltonian circuit: [ @c > $1 ]\n";
} else {
print "No Hamiltonian circuit\n";
}
__END__
$string = q[
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
12 13 14 15 23 24 25 34 35 45
13 15 23 24 25 45
];
$regex = q[
^ .* \b (\d+) \b .* \n
.* \b (\d+) \b .* \n
.* \b (\d+) \b .* \n
.* \b (\d+) \b .* \n
.* \b (\d+) \b .* \n
(?= .* \b (?: \1\2  \2\1 ) \b )
(?= .* \b (?: \1\3  \3\1 ) \b )
(?= .* \b (?: \1\4  \4\1 ) \b )
(?= .* \b (?: \1\5  \5\1 ) \b )
(?= .* \b (?: \2\3  \3\2 ) \b )
(?= .* \b (?: \2\4  \4\2 ) \b )
(?= .* \b (?: \2\5  \5\2 ) \b )
(?= .* \b (?: \3\4  \4\3 ) \b )
(?= .* \b (?: \3\5  \5\3 ) \b )
(?= .* \b (?: \4\5  \5\4 ) \b )
.*\n
(?= .* \b (?: \1\2  \2\1 ) \b )
(?= .* \b (?: \2\3  \3\2 ) \b )
(?= .* \b (?: \3\4  \4\3 ) \b )
(?= .* \b (?: \4\5  \5\4 ) \b )
(?= .* \b (?: \5\1  \1\5 ) \b )
];
There's no more need for repeating the listing of @all_edges so many times. With lookaheads, it only needs to be there once. Same with the listing of @E. This reduces $string to O(V^2).
Because backtracking doesn't happen within lookaheads, I couldn't use lookaheads to select the captured vertices. So the listing of vertices 1 to $V still has to be there $V times.
 [reply] [d/l] 

