Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Update: answered my own question, see below.

Can you explain a little more how it works? I can see how it selects items based on value, and then enforces that there is enough weight for the combined items selected. But what I'm not seeing is how it only selects the highest-weight solution.

Here is what I thought would happen with the following instance:

my $max_weight = 10; my %items = ( # value, weight: positive integers 'a' => { v => 1, w => 2 }, 'b' => { v => 11, w => 9 }, );
The regex engine would try the following:
first half second half $1 $2 attempt 1: v vvvvvvvvvvv --> ww + wwwwwwwww = 11, not enough w's! backtrack attempt 2: v <empty> --> ww = 2, enough w's for this, so accept +ed
In other words, I would have thought that ($1, $2, $3, $4, ...), each of which has two possible assignments, would get tried out in a canonical ordering like 11111, 11110, 11101, ...

But that doesn't seem to be the case. I modified the code as follows, to print the value of $1, $2, etc, upon each backtracking branch:

# used to be: # my $re = sprintf "%s0\n(?=\n%s)\n", $left, $right; use Data::Dumper; my $re = sprintf qq[%s0\n(?{ print "trying " . Dumper \\%%+})\n(?=\n%s +)\n], $left, $right; print $re;
Then with the example above, I always get the following output, independent of whether item 'a' or 'b' comes first in the matching:
STRING: vvvvvvvvvvvv0wwwwwwwwww REGEX: (?<b>(?:vvvvvvvvvvv)?) (?<a>(?:v)?) 0 (?{ print "trying " . Dumper \%+}) (?= (?(?{ $1 })wwwwwwwww|) (?(?{ $2 })ww|) ) OUTPUT: trying $VAR1 = { 'a' => 'v', 'b' => 'vvvvvvvvvvv' }; trying $VAR1 = { 'a' => '', 'b' => 'vvvvvvvvvvv' };
STRING: vvvvvvvvvvvv0wwwwwwwwww REGEX: (?<a>(?:v)?) (?<b>(?:vvvvvvvvvvv)?) 0 (?{ print "trying " . Dumper \%+}) (?= (?(?{ $1 })ww|) (?(?{ $2 })wwwwwwwww|) ) OUTPUT: trying $VAR1 = { 'a' => 'v', 'b' => 'vvvvvvvvvvv' }; trying $VAR1 = { 'a' => '', 'b' => 'vvvvvvvvvvv' };
In both cases, its first backtracking change is to change the choice of 'a', no matter whether the 'a' alternative or 'b' alternative is last!

Can you help demystify me? Somehow you are getting the regex engine to go through possible matches in the v-string not in some canonical ordering of alternatives (an easy local way to backtrack), but in order of their longest combined length (a global property of the set of alternatives)? This is the only interpretation under which the regex would actually be guaranteed to output the correct answer!

Update: I think I have demystified myself. It has to do with the string being anchored at the central '0':

  • Regex engine backtracks through a ton of choices, trying to match many combinations of v's starting at pos=0, followed immediately by a '0' character. Suppose there are N v's, then if no combination of alternatives sums up to N, then this can't match. So...
  • Regex backtracks tries to match the regex from pos=1, thus searching for a combination of alternatives that sums up to N-1. If that doesn't work,
  • Backtrack to match from pos=2, ...
Thus, thanks to this anchoring at the '0' character, the regex engine does indeed backtrack through combinations of v's in descending order of their sum. As I mentioned above, this must happen if you have any chance of outputting the optimal solution.

That is clever!

My confusion above would have been appropriate for a regex that matched like this:

^ (?<a>(?:v)?) (?<b>(?:vvvvvvvvvvv)?) v* 0
but not the one used in the OP, where the choice of v's must be snug up against the '0' character:
(?<a>(?:v)?) (?<b>(?:vvvvvvvvvvv)?) 0

blokhead


In reply to Re: knapsack problem solved by regex by blokhead
in thread knapsack problem solved by regex by rubasov

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (7)
As of 2024-04-19 09:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found