Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: regex is not working as I intended

by GrandFather (Saint)
on Jan 18, 2018 at 23:11 UTC ( [id://1207489]=note: print w/replies, xml ) Need Help??


in reply to regex is not working as I intended

I'm not sure why, but the last alternate ((.*)) seems to win in all cases when the other alternates use a look behind. However, things are much easier to understand if you look around less:

use strict; use warnings; my $regex = qr/(' ([^']*) ' | " ([^"]*) " | (.*))/x; do_test (qq~No quote~); do_test (qq~'Single quote'~); do_test (qq~"Double quote"~); sub do_test { my ($line) = @_; print "\n"; if ($line =~ $regex) { print "\$1 is $1.\n" if defined $1; print "\$2 is $2.\n" if defined $2; print "\$3 is $3.\n" if defined $3; print "\$4 is $4.\n" if defined $4; } else { print "No match.\n"; } }

Prints:

$1 is No quote. $4 is No quote. $1 is 'Single quote'. $2 is Single quote. $1 is "Double quote". $3 is Double quote.

Note too various other tidy ups in the code, especially avoiding calling subs with & (which doesn't do what you think) and excessive use of \.

Premature optimization is the root of all job security

Replies are listed 'Best First'.
Re^2: regex is not working as I intended
by ikegami (Patriarch) on Jan 19, 2018 at 05:20 UTC

    I'm not sure why,

    At position 0,

    1. (?<=\')(.*?)(?>\') can't possibly match (there can't be a ' before the first character),
    2. (?<=\")(.*?)(?>\") can't possibly match (there can't be a " before the first character), but
    3. (.*) always matches.

    Since it matched at position 0, it doesn't try to match at position 1 (where one of the first two alternates has a chance of matching).

      I forgot to mention that this regex is used repeatedly on the same string to parse out parameter and value pairs, such as the following:

      radius = 3, density = .014, URL = "https://www.geometry.org", max_no_of_attempts = 4

      That is why the lookbehind and lookahead assertions are used, even though they cannot possibly match at the beginning or end of the overall parameter string, they can match at intermediate positions as the regex is used to walk over the value of the parameter string parsing out individual parm and value pairs.

        Further to GrandFather's post:   Part of the useful enhanced context would be an expected hash structure for each multiple parameter/value pair string (hopefully, more than just one!) that you provide. E.g., for the multi-line string example here, do you expect

        %values = ( 'radius' => 3, 'density' => .014, 'URL' => '"https://www.geometry.org"', 'max_no_of_attempts' => 4, );
        or do you want something like
        %values = ( ..., 'URL' => 'https://www.geometry.org', ..., );
        (double-quotes stripped away)? How should wonky quoted strings be handled? By die-ing. By warn-ing and continuing to process? In another way?

        And yes, I think a module probably already exists to do this sort of thing.


        Give a man a fish:  <%-{-{-{-<

        Maybe you should create a new node with a bit more of the context for this problem. It sounds to me like there could be a call for a bigger hammer than just a single regex.

        Premature optimization is the root of all job security

        That doesn't change anything. The fixed version provided by Grandfather can be used to in parsing the snippet you provided.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1207489]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (4)
As of 2024-04-16 20:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found