Not Quite Longest Path Problem

Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:

All,
I assure you, this is a perl question but I have written it in story form.

You awake with a throbbing headache and blurred vision. As you come to, you realize you are not in your room and the sudden adrenaline rush gets you fully alert. The room is such a stark white that it is hard to tell where the edges and corners are. As you get up to go out one of the many open doors in the room, one of the walls turns into a display monitor. On the monitor is a bunch of green blobs glowing with different intensities and lines connecting blobs to each other. You almost forget what is going on as you watch a blip zipping around from blob to blob. As you realize there might be some significance to the one red blob, a voice greets you.

The first thing the voice tells you is that in 24 hours, you will live or die depending on your intellect or your luck. You try to keep a level head as you realize that this must be a recording. The voice is not responding to your questions and your pleas for an explanation are futile. You must pay attention because you do not know if this information will be repeated.

....is how much you have to collect in order to save your life. At the center of each room is a token worth different amounts. In 24 hours from now, if you have not collected enough to pay the ransom, your life is forfeit. If you are able to actually survive *stifled laughter*, any amount above the ransom will be deposited into your bank account. I hope you have been paying attention to the map on the screen. The blip is tracing the path to your salvation. The screen blanks out and returns once again to the stark white wall. Oh, one last thing the voice says, the rules about which doors stay open and close change as you go.

Silence. No, not silence - you can hear the pounding of your heart as you try to think. The map looked like an undirected graph. The red blob must represent the room I am in. The blip. The blip is important - where was it going - grrrrrr I should have paid more attention. Rules, what rules. Ok. Calm down and think. Assuming I wasn't lied to, this can't be the travelling salesman problem nor the hamiltonian path problem because I should be able to collect enough for the ransom and still have money left over. It seems like it is the Longest path problem which is NP complete - damn. Well, maybe not. Obviously a short path could still win provided the tokens collected were sufficient. Also, there is a time limit which implies not all possible paths can be searched - damn again - the voice did say luck could be a factor. Ok, so I need to find a path that allows me to collect - what was the ransom - I can't remember - was I listening - it doesn't matter, just collect as much as I can as fast as I can. Ok, so what is my strategy. I vaguely remember something about not being able to return to a room without dropping the token from the current room and - grrr, I can't remember. I guess it doesn't matter, apparently the rules change anyway. I am wasting time.

As you look up to leave, you notice displays have appeared above each door. Each display reads "Token: $amount Open Doors: $num". You head down the hallways with the door with the largest token amount and when enter, it is just an empty room. So I can't trust the displays. You notice the token in the center of the room and when you pick it up to examine it, the door you came through immediately turns into a wall and doors open up with display. You put the token back and the room returns to its original state. You continue this process (choosing the door with the highest token amount) until you enter the 10th room, and then everything changes.

The number of doors that should have opened didn't. You realize the rules are changing but you can't seem to figure out how. You continue your current strategy until you enter a room where no doors open. Do I have enough? Should I back track? How much time has gone by? Frustrated you backtrack and realize that every 10 tokens you collect, the rules change some how.

Ok, I hope you enjoyed my story introduction to the problem. Here is an explanation of the real problem.

I am working on is a strategy to play a word chain game. Each 4 letter word (approximately 4,000) represents each blob in the story's map. The blobs with connecting paths represent other 4 letter words that have an edit distance of 1. Points are awarded using letter frequency (rare letters are worth more points). These are the token values/blob intensity from the story.

Unlike the story version, figuring out how the rules of the game change is possible. Each level consists of 10 words and the next level replaces the current "rule" with a new rule. On level 1, the rule is "no restrictions". On level 2, the rule is "which ever letter you used to form the first word in level 2 can't be used in any word for the remainder of the level. On level 3, the rule is "You can't use the same position to form new words in back to back words". In other words, if you changed "mark" to "lark", you couldn't change it to "bark" but you could change it to "lurk" because it uses a different position. You can then go back to the first position if you want. I have only played a few times so I haven't figured out level 4 but you get the idea.

Now, I have already created a data structure where every word has a list of neighbors and their point value. This is why in the story version, you know how much each of your neighbors are worth and how many doors they have. Unfortunately, the number of doors is before applying rules and taking into consideration previously visited rooms. Keeping everything up to date is possible but time/resource consuming.

Ok, but how does the time limit come into play? In two ways. First, each time you make a choice of your next word there is a time limit to choose the next word. Fortunately, before you make your first move you are presented with your starting word so you can take as much time as you want to plan your path. I doubt I would let the computer think about it for 24 hours but if you precalculate your path then the time limit to choose the next word should be a moot point.

I hope I have explained everything adequately, but if I have not please feel free to reply or /msg me. If you are interested in playing the game yourself and are on FaceBook it is called "Word Chain Plus". The question I am asking is - given this situation, what strategy would you use to maximize your score. My current strategy (unimplemented):

Perform a DFS using "most neighbors" for X minutes
Perform a DFS using "most points" for X minutes
See which provides the highest score and use it

Note: I will be calculating "most neighbors" at each step since neighbors will become unavailable based on the current rule as well as which words have already been used (can't use the same word twice).

Update: The ever changing rules get in the way long before "longest path" becomes an issue. See FaceBook's Word Chain Plus

Cheers - L~R

Comment on Not Quite Longest Path Problem

Replies are listed 'Best First'.
Re: Not Quite Longest Path Problem by Bloodnok (Vicar) on Oct 23, 2009 at 17:11 UTC
Sounds like an application of a connected directed weighted graph - one, or more, of the various graph theory related modules may be your friend ... along with a judicious coating of combinatorics/graph theory (allowing you to discern the best module(s) for your use). A user level that continues to overstate my experience :-))	[reply]
Re^2: Not Quite Longest Path Problem by Limbic~Region (Chancellor) on Oct 23, 2009 at 18:50 UTC
Bloodnok, Actually, I started with the premise that it was a weighted undirected graph and tried to find a heuristic algorithm for the longest path problem. Unfortunately, this doesn't work because the graph itself changes every time you make a decision. In other words, what is connected to what (starting at level 2) changes each step you take. Perhaps this works to my advantage (no longer NP complete) but I haven't figured it out - which is why I asked for suggestions. Cheers - L~R	[reply]
Re: Not Quite Longest Path Problem by JavaFan (Canon) on Oct 23, 2009 at 16:14 UTC
Each 4 letter word (approximately 4,000) That sounds like the Enable word list. `% grep -c '^....$' enable.lst 3919` [download]	[reply] [d/l]
Re^2: Not Quite Longest Path Problem by Limbic~Region (Chancellor) on Oct 23, 2009 at 18:46 UTC
JavaFan, Actually, I took the Scrabble Tournament Word List and pulled out all 4 letter words. It was a little over (not under) 4000. Some of the games on FaceBook indicate what list they use (this one doesn't) but I have found it is the safest list to use. Cheers - L~R	[reply]
Re: Not Quite Longest Path Problem by jethro (Monsignor) on Oct 23, 2009 at 22:07 UTC
On level 2, the rule is "which ever letter you used to form the first word in level 2 can't be used in any word for the remainder of the level Clarification please. Do you really mean that only the letter used in the first word can't be used thereafter? Or do you mean "which ever letter you used in any previous step can't be used from then on for the remainder of the level" ? In the first case the graph wouldn't change after the first step, which would really trivially divide the problem into less than 26 unchanging subproblems. Only in the second case there is constant change in the graph One refinement for level 2 in the second case would be to pick random pairs of expensive words (without any letter common to the first word of level 2) and look for an expensive path of 4 words between them. For every row of 6 words found this way check if there is a path from the first word of level 2 to either end of this row (since you can turn around the row without invalidating it). Example: First word in level 2 is "bean". Now randomly find two words that have neither 'b','e','a' and 'n' in them and no common letter between them. For example "womb" and "pixy". Now look for a string of 4 words between "womb" and "pixy" using no letter from "bean" inbetween. Since 4 of the 5 letters needed to change from womb to pixy are already known ('p','i','x' and 'y'), an exhaustive search of routes between the two words should be very fast. If you find such a route (which should be very expensive since you could freely select 8 out of the 9 letters), you now would look for a route from 'bean' to either 'womb' or 'pixy' with 3 words inbetween, also rather trivial as the letters to use are all fixed.	[reply]
Re^2: Not Quite Longest Path Problem by Limbic~Region (Chancellor) on Oct 23, 2009 at 23:34 UTC
jethro, Rather than fail with words, I will try not to fail with an example: `Level 1 9: moon 10: noon Level 2 11: loon (Can not use 'l' for the rest of level 2)` [download] Actually, I think it is the letter to make the last word from level 1 that becomes unavailable for level 2 but it really doesn't change the problem. Perhaps I haven't worked with graphs and graph theory enough to understand how continuing to think about this as a graph helps. According to Wikipedia, this is an NP complete problem. Since I indicated I am fine with a suboptimal solution I want to stick to something simple and fast. Do you think this approach would produce solutions consistently better than my drop dead simple approach outline in the root node with a run-time under an hour? Cheers - L~R	[reply] [d/l]
Re^3: Not Quite Longest Path Problem by jethro (Monsignor) on Oct 24, 2009 at 00:57 UTC
Well, this doesn't really answer my questions, since the two cases start to differ from 12 on. In the first case: 12: loop (Can not use 'l' for the rest of level 2) In the second case: 12: loop (Can not use 'l' and 'p' for the rest of level 2) Anyway: I assume you meant the first case. Then level 2 is just as easy (or difficult) as level 1, the only difference is that after step 11 (or 10) one character is blacklisted Whether the algorithm I proposed would give better solutions probably depends on the density of words to non-words. In our case that would be 4000/ 26^4 = 1/114. I.e. only one in 114 four-letter combinations is a real word. To find a path from the start word to one of the two words of the 6-word-chain one of the 48 combinations of intermediate words would have to be made out of real words, this happens once every approximately 114^3 /48 = 31000 times. The ratio is even worse because you are looking for expensive words that are probably not as well connected as the average word. To be somewhat sure that you get at least one solution the program should at least be able to test 20 times as many word chains in the alloted run-time, lets say 31000 times 20 = 600.000 test per hour or 170 tests a second. So you have to find 170 valid chains of 6 words a second, rather a tall order. I think in this scenario the density is maybe too low for my algorithm to have much success.	[reply]
Re^4: Not Quite Longest Path Problem by Limbic~Region (Chancellor) on Oct 24, 2009 at 01:11 UTC
Re^5: Not Quite Longest Path Problem by jethro (Monsignor) on Oct 24, 2009 at 12:01 UTC
Re^5: Not Quite Longest Path Problem by JavaFan (Canon) on Oct 24, 2009 at 14:07 UTC
Some notes below your chosen depth have not been shown here
Re: Not Quite Longest Path Problem by SuicideJunkie (Vicar) on Oct 23, 2009 at 20:23 UTC
Level 2 sounds like it could be represented by 26 copies of the original graph, with a subset of the nodes (those containing the letter you chose to use first) removed from each copy. You essentially must choose which of the 26 variant subgraphs you want to play in, and then throw the others away. Level 3 could be four copies of the graph, but for each link, you alter it to point to the corresponding node in Copy #N if that link changes letter #N. Then clean up by deleting any links that go from Copy N to Copy M where N==M. The graph will have to become directed, since you can go from moon(1) to mood(4) or mood(1) to moon(4) but not mood(4) to moon(1)	[reply]
Re^2: Not Quite Longest Path Problem by Limbic~Region (Chancellor) on Oct 23, 2009 at 23:23 UTC
SuicideJunkie, Yes, I am sure one could continue to think in terms of graphs but do you think it will be of any use. In other words, I think it becomes more complex not less. At least, more complex than I am willing to invest to wow and amaze friends. Cheers - L~R	[reply]
Re: Not Quite Longest Path Problem by JavaFan (Canon) on Oct 24, 2009 at 14:56 UTC
I've made a program that just randomly picks a word, taking all rules into account. It does this 1000 times, remembering the list that gave it the best score. On many words, it finds list longer than 400 words. Probably far from optimal, but it should provide you an opportunity to find out whether there are additional rules `;-)`. The program takes a start word as argument, and optionally, the number of times it should try. (Without arguments, it picks a random word). #!/usr/bin/perl use 5.010; use strict; use warnings; use autodie; my $dict = "word_list"; my $size = 4; open my $fh, "<", $dict; my @words = <$fh>; close $fh; chomp @words; my $start = @ARGV ? shift : $words[rand @words]; my $runs = @ARGV ? shift : 1000; my %words = map {$_ => 1} @words; my %graph; my %freq; my %score; foreach my $word (@words) { foreach my $index (0 .. $size - 1) { $graph{$word}[$index] = {}; foreach my $c ('a' .. 'z') { my $w1 = $word; substr $w1, $index, 1, $c; next if $w1 eq $word; next unless $words{$w1}; $graph{$word}[$index]{$c} = $w1; } } $freq{$_}++ for split //, $word; } my $tot_letters = $size * @words; foreach (keys %freq) { $freq{$_} = 1 - $freq{$_} / $tot_letters; } foreach my $word (@words) { $score{$word} += $freq{$_} for split //, $word; } my $best_score = 0; my @best; for (1 .. $runs) { my @list; my %seen; my $forbidden = " "; my $changed = $size; push @list, $start; $seen{$start} = 1; while (1) { my $turn = @list + 1; my $current = $list[-1]; # # Get all possibilities. # my @nbs; for (my $i = 0; $i < @{$graph{$current}}; $i++) { next if $i == $changed && $turn >= 21; my $l = $graph{$current}[$i]; next unless $l; my @letters = grep {$_ ne $forbidden} keys %$l; push @nbs, $$l{$_} for @letters; } # # Filter out things we've seen. # @nbs = grep {!$seen{$_}} @nbs; last unless @nbs; # End of the line. my $pick = $nbs[rand @nbs]; push @list, $pick; $seen{$pick}++; foreach my $i (0 .. $size - 1) { next if substr($list[-2], $i, 1) eq substr($list[-1], $i, +1); $changed = $i; last; } if ($turn == 11) { $forbidden = substr $list[-1], $changed, 1; } elsif ($turn == 21) { $forbidden = " "; } } my $score = 0; $score += $score{$_} for @list; if ($score > $best_score) { @best = @list; $best_score = $score; } } for (my $i = 0; $i < @best; $i++) { printf "%3d %s\n", $i + 1, $best[$i]; } say "$start: list of ", scalar @best, " words. Score: $best_score"; __END__ [download]	[reply] [d/l]
Re^2: Not Quite Longest Path Problem by Limbic~Region (Chancellor) on Oct 24, 2009 at 16:03 UTC
JavaFan, I will clean my code up and post what I have later. It doesn't do a DFS as I have found I almost never need to backtrack. I have only made it to level 13 but that was enough to become the all-time high score for all of FaceBook. Cheers - L~R	[reply]