Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^5: RegExp substitution

by AnomalousMonk (Archbishop)
on Apr 11, 2014 at 00:01 UTC ( [id://1081892]=note: print w/replies, xml ) Need Help??


in reply to Re^4: RegExp substitution
in thread RegExp substitution

...
s/$1/$first/g;
...
s/$2/$second/g;
...
s/$3/$third/g;
...
s/$4/$fourth/g;
...

The critical thing to realize about this code is that the capture variables  $2 $3 $4 have never been set to any meaningful value. I.e., they have the undefined value undef. When the undefined value is interpolated into a string or a regex, it interpolates as  '' (the empty string), or, in the case of a regex,  // (the empty regex).

...
/([A-Z][a-z]+)/g;
s/$2/$second/g;
...

This pair of statements and corresponding succeeding statement pairs is very interesting. I strongly recommend you insert the statement
    print qq{=== '$_' \n};  # FOR DEBUG
or its equivalent after each and every of the  s/// substitution statements to monitor what's going on with the progressive 'correction' of the initial string.

Here's a narrative. As you can see from the newly-added debug print statement, the first
    /([A-Z][a-z]+)/g;
    s/$1/$first/g;
statement pair actually does something expected and useful: it replaces the first number with 'One'. The output from the debug print statement is
    === 'One, Four, One, Two'

The second
    /([A-Z][a-z]+)/g;
    s/$2/$second/g;
statement pair replaces all numbers with 'Two'! The output from the debug print statement is
    === 'Two, Two, Two, Two'

The reason for this odd behavior is that when  $2 with an undefined value interpolates into  s/$2/$second/g; it produces the  // empty regex match pattern. This pattern is special: it uses the last successful regex match pattern for matching. The last successful match pattern was in the  /([A-Z][a-z]+)/g; statement immediately before the  s/// substitution statement. Therefore,
    s/$2/$second/g;
interpolates (ignoring, as you do, the warning message) as if it were
    s//$second/g;
which matches as if it were
    s/([A-Z][a-z]+)/$second/g;
which replaces each and every match (because of the  /g modifier) against the  ([A-Z][a-z]+) pattern (i.e., something that looks like a number) with, in this case, 'Two'. Whew!

And similarly for each subsequent  //; s///; statement pair.

That ought to give you something to think about while you're reviewing the regex documentation.

(BTW: The  /g modifier in the  /([A-Z][a-z]+)/g; statement is at best useless and at worst confusing and corrupting. You cannot use the  /g modifier in this way to "keep track" of match positions in successive matches. (The  /c modifier in conjunction with the  /g modifier does something like this in certain cases, but I don't really see how it could be adapted to serve here.) You will have to think of some other way to query the user about successive numbers in the original string so that they may be 'corrected' one by one.)

Replies are listed 'Best First'.
Re^6: RegExp substitution
by Keystone (Initiate) on Apr 11, 2014 at 07:01 UTC
    Thank you, I realise it's perhaps not the norm but the way you've broken down the code here is the way I'm currently thinking while programming in Perl and was very easy to understand. Furthermore, having seen your other post recommending I take a step back and look at basic RegEx I'm now going through those tutorials.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1081892]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (11)
As of 2024-04-18 14:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found