Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^3: RegExp substitution

by HereandThere (Initiate)
on Apr 10, 2014 at 20:55 UTC ( [id://1081866]=note: print w/replies, xml ) Need Help??


in reply to Re^2: RegExp substitution
in thread RegExp substitution

Keystone,

I am relatively new here at perlmonks, but perhaps I can help a little bit.

You asked why the regexp matched only once, instead of multiple times. This is so because it is a FEATURE of the rules of regex to only do so unless something like the "global" switch is added *at the end of the regex in play*.

If you invoke the global switch, then all matches will be replaced with the substitution string.

So, if you had some code:

$_="FourThreeTwoOne, Three, Four, One, Two";
$1="Three";
$second="&&&&";
s/$2/$second/g;
print;

it would print this result:

Four&&&&TwoOne, &&&&, Four, One, Two

Hope this helps.

-HaT

Replies are listed 'Best First'.
Re^4: RegExp substitution
by Laurent_R (Canon) on Apr 10, 2014 at 21:47 UTC
    Hello HereandThere, welcome to the Monastery. Hmm, did you try this code that you posted?
    $_="FourThreeTwoOne, Three, Four, One, Two"; $1="Three"; $second="&&&&"; s/$2/$second/g; print;
    I am pretty sure it cannot work. For a start, $1 is a read-only value that can be set only by a regex.
    $1="Three";
    I do not think that the perl compiler accepts that, but even if it did, it should not be done. In my view, $1 is a special variable that should be kept for just one single purpose: the first capture in a regex.
    s/$2/$second/g;
    This makes even less sense, since $2 has not been set anywhere, it is undefined, there is just no way this is gonna work. In addition, I would suggest that if you set $_ to something, you first localize it within some lexical block:
    { local $_="FourThreeTwoOne, Three, Four, One, Two"; # ... }
    Furthermore, you should probably have these pragmas at the top of your script:
    use strict; use warnings;
    and they would force you to rewrite the third line of your code as:
    my $second="&&&&";
    Finally, I don't even understand what your code is supposed to demonstrate to the OP. Well, to tell the truth, if you were not so new on this forum, I would probably down vote your post (although I almost never down vote posts for other reasons than spamming, insults, completely off-topic posts or other clear netiquette violations). I'll refrain from doing it here in consideration of the fact that you are new here.
Re^4: RegExp substitution
by Keystone (Initiate) on Apr 10, 2014 at 21:31 UTC
    Hi there, Thanks for your reply, I had tried something using the /g global earlier but I discounted it for changing every variable every time, here is what I had:
    #!/usr/bin/perl #subs4.plx use warnings; use strict; #try using /g global to remember where I'm up to in a match my $pattern; $_ = "Three, Four, One, Two"; print ("\t\tCounting Program\n\n", $_, "\n\n"); my $correct; print "Is this sequence correct?(yes/no)\n"; $correct = <STDIN>; chomp ($correct); while ($correct ne "yes"){ print "Is the first number correct?\n"; my $first = <STDIN>; chomp ($first); if ($first ne "yes"){ print"What should it be?\n"; $first = <STDIN>; chomp ($first); /([A-Z][a-z]+)/g; s/$1/$first/g; } print "Is the second number correct?\n"; my $second = <STDIN>; chomp ($second); if ($second ne "yes"){ print"What should it be?\n"; $second = <STDIN>; chomp ($second); /([A-Z][a-z]+)/g; s/$2/$second/g; } print "Is the third number correct?\n"; my $third = <STDIN>; chomp ($third); if ($third ne "yes"){ print"What should it be?\n"; $third = <STDIN>; chomp ($third); /([A-Z][a-z]+)/g; s/$3/$third/g; } print "Is the fourth number correct?\n"; my $fourth = <STDIN>; chomp ($fourth); if ($fourth ne "yes"){ print"What should it be?\n"; $fourth = <STDIN>; chomp ($fourth); /([A-Z][a-z]+)/g; s/$4/$fourth/g; } #Final print print ($_, "\n\n"); print "Is this sequence correct now?(yes/no)\n"; $correct = <STDIN>; chomp ($correct); }
    After running through each of the <STDIN>'s the final print is "Four, Four, Four, Four" - in retrospect this is possibly the closest I got to my actual solution! This is what prompted me to ask the question where-in I was looking for a way to ignore the first match of a RegEx the second time it's run. Again, thank you for your help, I feel I am close to a solution. -- Just had a thoguht pre-posting, it is possible (but perhaps not elegant) to run the RegEx /A-Za-z+/, save the result to a variable and substitute the match with whitespace.. then call the variable later.. but thinking about it this is just a cheat/hack and not really using the substitute fnction of a RegEx. Regards Keystone
      ...
      s/$1/$first/g;
      ...
      s/$2/$second/g;
      ...
      s/$3/$third/g;
      ...
      s/$4/$fourth/g;
      ...

      The critical thing to realize about this code is that the capture variables  $2 $3 $4 have never been set to any meaningful value. I.e., they have the undefined value undef. When the undefined value is interpolated into a string or a regex, it interpolates as  '' (the empty string), or, in the case of a regex,  // (the empty regex).

      ...
      /([A-Z][a-z]+)/g;
      s/$2/$second/g;
      ...

      This pair of statements and corresponding succeeding statement pairs is very interesting. I strongly recommend you insert the statement
          print qq{=== '$_' \n};  # FOR DEBUG
      or its equivalent after each and every of the  s/// substitution statements to monitor what's going on with the progressive 'correction' of the initial string.

      Here's a narrative. As you can see from the newly-added debug print statement, the first
          /([A-Z][a-z]+)/g;
          s/$1/$first/g;
      statement pair actually does something expected and useful: it replaces the first number with 'One'. The output from the debug print statement is
          === 'One, Four, One, Two'

      The second
          /([A-Z][a-z]+)/g;
          s/$2/$second/g;
      statement pair replaces all numbers with 'Two'! The output from the debug print statement is
          === 'Two, Two, Two, Two'

      The reason for this odd behavior is that when  $2 with an undefined value interpolates into  s/$2/$second/g; it produces the  // empty regex match pattern. This pattern is special: it uses the last successful regex match pattern for matching. The last successful match pattern was in the  /([A-Z][a-z]+)/g; statement immediately before the  s/// substitution statement. Therefore,
          s/$2/$second/g;
      interpolates (ignoring, as you do, the warning message) as if it were
          s//$second/g;
      which matches as if it were
          s/([A-Z][a-z]+)/$second/g;
      which replaces each and every match (because of the  /g modifier) against the  ([A-Z][a-z]+) pattern (i.e., something that looks like a number) with, in this case, 'Two'. Whew!

      And similarly for each subsequent  //; s///; statement pair.

      That ought to give you something to think about while you're reviewing the regex documentation.

      (BTW: The  /g modifier in the  /([A-Z][a-z]+)/g; statement is at best useless and at worst confusing and corrupting. You cannot use the  /g modifier in this way to "keep track" of match positions in successive matches. (The  /c modifier in conjunction with the  /g modifier does something like this in certain cases, but I don't really see how it could be adapted to serve here.) You will have to think of some other way to query the user about successive numbers in the original string so that they may be 'corrected' one by one.)

        Thank you, I realise it's perhaps not the norm but the way you've broken down the code here is the way I'm currently thinking while programming in Perl and was very easy to understand. Furthermore, having seen your other post recommending I take a step back and look at basic RegEx I'm now going through those tutorials.
Re^4: RegExp substitution
by AnomalousMonk (Archbishop) on Apr 10, 2014 at 22:23 UTC

    Laurent_R has already posted a reply that covers important points I had wanted to make.

    Let me say this in addition. You're trying to be helpful and that's very good, but it doesn't help to offer wrong advice. The reason I almost always post code as cut/pastes from command-line executions is that, in addition to providing a complete context for execution, the code is actually executed and so is known to be at least syntactically correct — and maybe even has a chance of being semantically correct.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1081866]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-03-29 08:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found