Just another Perl shrine PerlMonks

### RegExp substitution

by Keystone (Initiate)
 on Apr 10, 2014 at 16:52 UTC Need Help??

Keystone has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, new to Perl and working through Simon Cozens free Beginning Perl book, tried to write myself a little program to test if I understood RegExp and it's not giving me the expected answers. Could anyone offer any guidance as to why the final print of \$_ gives "FourThreeTwoOne, Three, Four, One, Two" please? As I said I'm only a novice, please be gentle!:)

```#!/usr/bin/perl
#subs.plx
use warnings;
use strict;

#An incorrectly ordered list to have the user organise
\$_ = "Three, Four, One, Two";

print ("\t\tCounting Program\n\n", \$_, "\n\n");
my \$correct;

print "Is this sequence correct?(yes/no)\n";
\$correct = <STDIN>;
chomp (\$correct);

while (\$correct ne "yes"){
print "Is the first number correct?\n";
my \$first = <STDIN>;
chomp (\$first);

if (\$first ne "yes"){
print"What should it be?\n";
\$first = <STDIN>;
chomp (\$first);
}

print "Is the second number correct?\n";
my \$second = <STDIN>;
chomp (\$second);

if (\$second ne "yes"){
print"What should it be?\n";
\$second = <STDIN>;
chomp (\$second);
}

print "Is the third number correct?\n";
my \$third = <STDIN>;
chomp (\$third);

if (\$third ne "yes"){
print"What should it be?\n";
\$third = <STDIN>;
chomp (\$third);
}

print "Is the fourth number correct?\n";
my \$fourth = <STDIN>;
chomp (\$fourth);

if (\$fourth ne "yes"){
print"What should it be?\n";
\$fourth = <STDIN>;
chomp (\$fourth);
}

#My RegExp
/([A-Z][a-z][.][\b])/;

#The substitutions based on my RegExp
s/\$1/\$first/;
s/\$2/\$second/;
s/\$3/\$third/;
s/\$4/\$fourth/;

#Final print reads:FourThreeTwoOne, Three, Four, One, Two
print (\$_, "\n\n");

print "Is this sequence correct now?(yes/no)\n";
\$correct = <STDIN>;
chomp (\$correct);
}

Anby guidance would be appreciated, Cheers, Keystone.

Replies are listed 'Best First'.
Re: RegExp substitution
by AnomalousMonk (Bishop) on Apr 10, 2014 at 17:29 UTC

But what does your RegExp actually match? (Also a 'fixed' version based on what I think you think you want.)

```c:\@Work\Perl\monks>perl -wMstrict -le
"\$_ = 'Three, Four, One, Two';
;;
/([A-Z][a-z][.][\b])/;
print qq{'\$1'  '\$2'  '\$3'  '\$4'};
;;
/([A-Z] [a-z]+ (?: , | \b))/x;
print qq{'\$1'  '\$2'  '\$3'  '\$4'};
"
Use of uninitialized value \$1 in concatenation (.) or string at -e lin
+e 1.
Use of uninitialized value \$2 in concatenation (.) or string at -e lin
+e 1.
Use of uninitialized value \$3 in concatenation (.) or string at -e lin
+e 1.
Use of uninitialized value \$4 in concatenation (.) or string at -e lin
+e 1.
''  ''  ''  ''
Use of uninitialized value \$2 in concatenation (.) or string at -e lin
+e 1.
Use of uninitialized value \$3 in concatenation (.) or string at -e lin
+e 1.
Use of uninitialized value \$4 in concatenation (.) or string at -e lin
+e 1.
'Three,'  ''  ''  ''

Why does the first regex match nothing at all? (That should be fairly easy to answer: take a careful look at it.) Why does the second regex match something, but only once when you want it to match several times? Why does  'Three,' have a comma at the end? Do you really want to capture this character?

Update 1: Another thing to remember is that each successful regex match (and a  s/// substitution must do a match — and you're doing four  s/// in a row) that is executed "wipes out" all capture variables  \$1 \$2 \$3 \$n and only re-assigns those corresponding to an actual capture group in the latest successful match.

```c:\@Work\Perl\monks>perl -wMstrict -le
"my \$s = 'foo bar baz';
\$s =~ m{ (foo) \s* (bar) \s* (baz) }xms;
print qq{A:  '\$1'  '\$2'  '\$3'};
;;
\$s =~ m{ (xyzzy) }xms;
print qq{B:  '\$1'  '\$2'  '\$3'};
;;
\$s =~ m{ (b \w*) }xms;
print qq{C:  '\$1'  '\$2'  '\$3'};
"
A:  'foo'  'bar'  'baz'
B:  'foo'  'bar'  'baz'
Use of uninitialized value \$2 in concatenation (.) or string at -e lin
+e 1.
Use of uninitialized value \$3 in concatenation (.) or string at -e lin
+e 1.
C:  'bar'  ''  ''

Update 2: Here's an approach (one of many) to the problem, but without the annoying  <STDIN> stuff:

```c:\@Work\Perl\monks>perl -wMstrict -le
"my \$s = 'Three, Four, One, Two, xFive9';
print qq{'\$s'};
;;
my @numbers = \$s =~ m{ \b [[:upper:]] [[:lower:]]+ \b }xmsg;
printf qq{'\$_' } for @numbers;
print '';
;;
my %correct;
@correct{ @numbers } = qw(one two three four);
;;
my (\$rx_search) =
map  qr{ \b (?: \$_) \b }xms,
join '|',
map  quotemeta,
keys %correct
;
print \$rx_search;
;;
\$s =~ s{ (\$rx_search) }{\$correct{\$1}}xmsg;
print qq{'\$s'};
"
'Three, Four, One, Two, xFive9'
'Three' 'Four' 'One' 'Two'
(?^msx: \b (?: Four|Three|Two|One) \b )
'one, two, three, four, xFive9'

Thank you for a reply. I'm sorry to say I think I am still a bit too inexperienced in the Perl language to follow the code in your reply fully but I have had a go and tried to answer your questions as fully as I can;

What is my RegExp trying to match? I am trying to match and substitute the words in the string \$_ by asking the user to input the correct string of number values. Originally I tried to match and substitute in each 'if' decision after the user input, however doing it this way I could not see a way to match to any string other than the first available without using a string literal.

i.e. \$_ = "three, four" I could not see a way to match to 'four' without using the literal, whereas, as I understood it the power of a RegExp came from it being able to find something in a string without a literal constant.

In essence I suppose what I am trying to do is: Psuedo-

```1st substitute/([A-Z][a-z][\W][\b])/<userinput>/;
```then
2nd substitute/(NOT THIS ONE[A-Z][a-z][\W][\b])(THIS ONE[A-Z][a-z][\W]
+[\b])/<userinput>/;
```then
3rd substitute/(NOT THIS ONE[A-Z][a-z][\W][\b])(AND NOT THIS ONE[A-Z][
+a-z][\W][\b])(BUT THIS ONE[A-Z][a-z][\W][\b])/<userinput>/;

Does that make any kind of sense?

Your first RegEx(A-Za-z.\b) matches nothing because no words in the string are 2-characters in size, adding a plus to the lower case set a-z+ would match the first word, but as I understand it . is capable of matching nothing as well as anything, therefore I beleive it would match nothing and the next character in the match would be a comma when the match is actually looking for a break.

Why does the second regex match something, but only once when you want it to match several times? I'm unsure about this part so I can't answer this question easily, (?: , | \b)). ? allows the preceeding character to be optional (but there is no preceeding character?) and I can't see the use of a colon in this context. I understand however that the comma is a literal constant to look for, OR a break. '/x' I have not yet come across. Has it only matched once because it is not part of a loop to tell it to match as many times as I want? I don't want the comma, so perhaps look only for A-Za-z but then how then do I ignore these the second time I want to match? If I must match only once (as I originally had tried to do, then why does Perl not find anything for \$2 \$3 and \$4?

As for Update 1 & 2 I'm afraid they're far beyond my capablities at this moment in time, I realise they're more than likely a cleaner way to write the code, I was simply trying to write a program for myself to show I understood RegExp (but clearly that is not the case!), I'm afraid the code in the two updates are far too advanced for me at this moment ;/

Keystone,

I am relatively new here at perlmonks, but perhaps I can help a little bit.

You asked why the regexp matched only once, instead of multiple times. This is so because it is a FEATURE of the rules of regex to only do so unless something like the "global" switch is added *at the end of the regex in play*.

If you invoke the global switch, then all matches will be replaced with the substitution string.

So, if you had some code:

```\$_="FourThreeTwoOne, Three, Four, One, Two";
\$1="Three";
\$second="&&&&";
s/\$2/\$second/g;
print;

it would print this result:

Four&&&&TwoOne, &&&&, Four, One, Two```

Hope this helps.

-HaT

... (?: , | \b)). ? allows the preceeding character to be optional (but there is no preceeding character?) and I can't see the use of a colon in this context.

The  (?:pattern) construct defines a non-capturing group. See Extended Patterns in perlre. This and other statements in your reply lead me to suggest that you take a big step backwards and read up on basic regex docs. Please see perlre. In particular, see perlretut for a very good tutorial. (I'm not familiar with the material in the Cozens book.) See also perlrequick for a quick reference. See also the material in the "Pattern Matching, Regular Expressions, and Parsing" area of the Tutorials section of this site.

1st substitute/([A-Z][a-z][\W][\b])/<userinput>/;

The critical thing to remember about  [...] character classes is that most regex metacharacters are not meta-special inside them. Thus,  [.] (which you have used elsewhere) matches a single  '.' (period) character and  [\b] matches a single backspace control-character. So the pattern above might be described as:

•  [A-Z] A single upper-case character; followed by
•  [a-z] A single lower-case character; followed by
•  [\W] A single character that is anything not matching a  \w 'word' character (\W is a class onto itself, so no enclosing square brackets are needed); followed by
•  [\b] A single backspace control character.
Is any of that what you really wanted?

Re: RegExp substitution
by AnomalousMonk (Bishop) on Apr 11, 2014 at 05:04 UTC

```use warnings;
use strict;

my \$string = 'Three, Four, One, Two, xFive9, six, Seven';

my \$number = qr{ \b [[:upper:]] [[:lower:]]+ \b }xms;

print qq{string is now '\$string' \n};

my \$ordinal = 0;

\$string =~ s{ (\$number) }

print qq{new string is '\$string' \n};
print qq{done! \n};

my (\$ordinal,
\$offset,
\$string,
) = @_;

my \$yes = qr{ (?i) y (?: e (?: s)? )?  }xmso;
my \$ok  = qr{ (?i) o (?:          k)?  }xmso;
my \$accept = qr{ \A (?: \$yes | \$ok) \Z }xmso;

print qq{sub-string \$ordinal at offset \$offset is '\$string' \n};

print qq{is this correct? };
return \$string if \$answer =~ \$accept;

print qq{no: enter new string: };
chomp(my \$replace = <stdin>);
return \$replace;

}

Output:

```c:\@Work\Perl\monks\Keystone>perl ask_to_replace_1.pl
string is now 'Three, Four, One, Two, xFive9, six, Seven'
sub-string 1 at offset 0 is 'Three'
is this correct? n
no: enter new string: Uno
sub-string 2 at offset 7 is 'Four'
is this correct? No
no: enter new string: Dos
sub-string 3 at offset 13 is 'One'
is this correct? y
sub-string 4 at offset 18 is 'Two'
is this correct? x
no: enter new string: Tres
sub-string 5 at offset 36 is 'Seven'
is this correct? n
no: enter new string: se7en
new string is 'Uno, Dos, One, Tres, xFive9, six, se7en'
done!
Re: RegExp substitution
by MidLifeXis (Monsignor) on Apr 10, 2014 at 18:01 UTC

You could always try out Regexp::Debugger.

Update: Corrected module name

--MidLifeXis

I love the idea MidLifeXis :-) ... but, sadly, the link doesn't quite work :-(

Have you managed to get [doc://...] and [mod://...] intermixed ?

A user level that continues to overstate my experience :-))

No, s/RegExp/Regexp/. .oO( Note to self - check links after posting )

--MidLifeXis

Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1081838]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (7)
As of 2022-06-29 18:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?