Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

This looks to me like an X-Y problem: you are trying to do X (which I presume is capture the values defined in a string), have decided the way to do this is Y (pull the string apart and rebuild it as you go until you hit some sentinel value), and are having trouble doing this.

To answer the question you asked: if you want any part of a regex to match only under certain circumstances, you put it in the regular expression where those circumstances apply. Your negative lookahead is anchored to the start of the string by '^', and will match only there. As for why the whole regular expression fails to match, you can try use re 'debug'; to give a somewhat cryptic insight on what the regex is trying.

But if my X-Y assumption is correct, I would implement differently, with a much simpler and more comprehensible regular expression.

#!/usr/bin/env perl

use 5.010;	# For branch reset
use strict;
use warnings;

my $SYSPBUFF = <<'EOD';
		run_type = dev,
		max_monitor_time = 0.25
		verbosity_level = 2

		batch =

			(
				source = sample_document_collection_1
				files = Confucius.docx
				dest = Enterprise:Department
			)
EOD

while ( $SYSPBUFF =~ m/
    (?|		# Branch reset
	\s* ( batch ) \s* = \s* ( .* ) |	# Match batch = ...
	\s* ( [^\s=]+ ) \s* = \s* ( [^\s,]+ )	# Match A = B
    )
    /smxg
) {
    print "Captured $1 = $2\n";
}

produces

Captured run_type = dev
Captured max_monitor_time = 0.25
Captured verbosity_level = 2
Captured batch = (
				source = sample_document_collection_1
				files = Confucius.docx
				dest = Enterprise:Department
			)

Is this the sort of thing you are after?

What the branch reset ((?| ... | ... | ... )) does is to re-use capture buffer numbers. In this specific regular expression I used it so that the name of the value would always appear in $1 and the value itself in $2, no matter which branch of the alternation matched.

Note that there is no explicit termination logic for the loop. I understood your problem statement to imply that the batch = was the last thing in the input, so that once you saw it you wanted the entire remainder of the input. If this is wrong the regex will need to match a parenthesis-delimited string. If the parentheses can be nested, that is more complicated. Module Regexp::Common can be helpful here.


In reply to Re: Unable to constrain the effect of a negative lookahead by Anonymous Monk
in thread Unable to constrain the effect of a negative lookahead by fireblood

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-04-24 11:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found