Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

how can I combine these expressions?

by knight (Friar)
on Jul 27, 2000 at 02:36 UTC ( #24589=perlquestion: print w/replies, xml ) Need Help??

knight has asked for the wisdom of the Perl Monks concerning the following question:

I'm manually interpolating values from a hash ref of internal "environment variables" into a user-supplied string. The string's magic interpolation character is '%', not '$', so "%FOO" in the string should be replaced with the value of $env->{'FOO'}, and "%{BAR}' should be replaced with $env->{'BAR'}, etc. Plus, the interpolation is recursive; if the interpolated value itself contains a '%', it gets re-expanded. Lastly, '%%' eventually gets replaced with a single '%', although this happens later, so I actually need to pass through %-pairs unchanged.

Hope I explained that clearly...

Anyway, the following code snippet does all of this the way I want:
# % expansion. %% gets converted to % later, so expand any # %keyword construction that doesn't have a % in front of it # (modulo multiple %% pairs in between). while (($str =~ s/(^|[^\%](?:\%\%)*)\%([_a-zA-Z]\w*)/"$1".$env->{$2}/g +e) || ($str =~ s/(^|[^\%](?:\%\%)*)\%\{([_a-zA-Z]\w*)\}/"$1".$env->{$ +2}/ge)) {}
But I'm doing this with two very similar expressions, the first for normal unadorned variable names ("%FOO") and the second for variable names enclosed in braces ("%{FOO}"). Can anyone suggest how these two might be combined, hopefully in a way that speeds up the processing? The obvious technique of sticking '*' after the braces in the regexes loses because I don't want to match unbalanced braces like "%FOO}".

I've been leafing through "Mastering Regular Expressions" in search of an answer, especially the section on lookahead, but that's right about where my brain starts filling up...

Replies are listed 'Best First'.
Re: how can I combine these expressions?
by plaid (Chaplain) on Jul 27, 2000 at 03:27 UTC
    How about this:
    $str =~ s/(^|[^\%](?:\%\%)*)\% (\{)? # Match and capture 1 or 0 braces ([_a-zA-Z]\w*) (?(2)\}) # If there's anything in $2, match en +ding brace /"$1".$env->{$3}/gex; # $3 now instead of $2 here
    This takes advantage of perl's conditional matching operator. From perlre:
    (?(condition)yes-pattern|no-pattern) (?(condition)yes-pattern) Conditional expression. (condition) should be either an integer in parentheses (which is valid if the corresponding pair of parentheses matched), or lookahead/lookbehind/evaluate zero-width assertion. Say, m{ ( \( )? [^()]+ (?(1) \) ) }x matches a chunk of non-parentheses, possibly included in parentheses themselves.
      I am humbled in the presence of greatness. :) That was slick.

      Stellar. That's exactly what I needed. Many thanks.

      (Note on some of the replies above: Simple alternation with '|' doesn't work because it includes the braces in the selected part of the regex. Given %{FOO}, we should interpolate $env->{'FOO'}, not $env->{'{FOO}'}.)
Re: how can I combine these expressions?
by chromatic (Archbishop) on Jul 27, 2000 at 02:44 UTC
    Why not just stick a ? after each brace? That means 'match zero or one times'. I don't see anything different that happens if the variable's braced, but my brain hurts today so definitely test it first. :)

    Update: Okay, the quick-correcting Ovid and lhoward point out the unbalanced problem. The only other thing I can think of is an alternation: (?:{([_a-zA-Z]\w*)}|([_a-zA-Z]\w*)) That part obviously goes after you've found the % sign. It will be more expensive, though. (The * operator hurts.)

      chromatic: sticking a ? after each brace would still allow mismatched braces to be passed through. Since this is user supplied data, I can see a potential for a lot of typos.


(Ovid) Re: how can I combine these expressions?
by Ovid (Cardinal) on Jul 27, 2000 at 03:20 UTC
    Well, I had trouble figuring out exactly what you needed, but a rough first (untested) try was simply using alternation. I also replaced the character class with a variable to make it a bit more legible.
    $valid = "(?:[_a-zA-Z]\w*)"; $str =~ s/(^|[^\%](?:\%\%)*)\%($valid|\{$valid\}))/"$1".$env->{$2}/ge;
    If you could post some working code with sample data, I could work on something better. I tried playing with your code and couldn't really get a good feel for what you were doing.


Re: how can I combine these expressions?
by lhoward (Vicar) on Jul 27, 2000 at 03:03 UTC
    I agree with chromatic except I see a small problem with his solution. With the regular expression:
    $str =~ s/(^|[^\%](?:\%\%)*)\%\{?([_a-zA-Z]\w*)\}?/"$1".$env->{$2}/ge
    it would match on something like %{FOO or %FOO}.

    How to aviod that in a single RE? I'm not quite sure... Time for me to do some digging....

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://24589]
Approved by root
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2022-05-27 13:28 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (95 votes). Check out past polls.