Keystone has asked for the wisdom of the Perl Monks concerning the following question:
Hi all, new to Perl and working through Simon Cozens free Beginning Perl book, tried to write myself a little program to test if I understood RegExp and it's not giving me the expected answers. Could anyone offer any guidance as to why the final print of $_ gives "FourThreeTwoOne, Three, Four, One, Two" please?
As I said I'm only a novice, please be gentle!:)
#!/usr/bin/perl
#subs.plx
use warnings;
use strict;
#An incorrectly ordered list to have the user organise
$_ = "Three, Four, One, Two";
print ("\t\tCounting Program\n\n", $_, "\n\n");
my $correct;
print "Is this sequence correct?(yes/no)\n";
$correct = <STDIN>;
chomp ($correct);
while ($correct ne "yes"){
print "Is the first number correct?\n";
my $first = <STDIN>;
chomp ($first);
if ($first ne "yes"){
print"What should it be?\n";
$first = <STDIN>;
chomp ($first);
}
print "Is the second number correct?\n";
my $second = <STDIN>;
chomp ($second);
if ($second ne "yes"){
print"What should it be?\n";
$second = <STDIN>;
chomp ($second);
}
print "Is the third number correct?\n";
my $third = <STDIN>;
chomp ($third);
if ($third ne "yes"){
print"What should it be?\n";
$third = <STDIN>;
chomp ($third);
}
print "Is the fourth number correct?\n";
my $fourth = <STDIN>;
chomp ($fourth);
if ($fourth ne "yes"){
print"What should it be?\n";
$fourth = <STDIN>;
chomp ($fourth);
}
#My RegExp
/([A-Z][a-z][.][\b])/;
#The substitutions based on my RegExp
s/$1/$first/;
s/$2/$second/;
s/$3/$third/;
s/$4/$fourth/;
#Final print reads:FourThreeTwoOne, Three, Four, One, Two
print ($_, "\n\n");
print "Is this sequence correct now?(yes/no)\n";
$correct = <STDIN>;
chomp ($correct);
}
Anby guidance would be appreciated,
Cheers,
Keystone.
Re: RegExp substitution
by AnomalousMonk (Archbishop) on Apr 10, 2014 at 17:29 UTC
|
But what does your RegExp actually match? (Also a 'fixed' version based on what I think you think you want.)
c:\@Work\Perl\monks>perl -wMstrict -le
"$_ = 'Three, Four, One, Two';
;;
/([A-Z][a-z][.][\b])/;
print qq{'$1' '$2' '$3' '$4'};
;;
/([A-Z] [a-z]+ (?: , | \b))/x;
print qq{'$1' '$2' '$3' '$4'};
"
Use of uninitialized value $1 in concatenation (.) or string at -e lin
+e 1.
Use of uninitialized value $2 in concatenation (.) or string at -e lin
+e 1.
Use of uninitialized value $3 in concatenation (.) or string at -e lin
+e 1.
Use of uninitialized value $4 in concatenation (.) or string at -e lin
+e 1.
'' '' '' ''
Use of uninitialized value $2 in concatenation (.) or string at -e lin
+e 1.
Use of uninitialized value $3 in concatenation (.) or string at -e lin
+e 1.
Use of uninitialized value $4 in concatenation (.) or string at -e lin
+e 1.
'Three,' '' '' ''
Why does the first regex match nothing at all? (That should be fairly easy to answer: take a careful look at it.) Why does the second regex match something, but only once when you want it to match several times? Why does 'Three,' have a comma at the end? Do you really want to capture this character?
Update 1: Another thing to remember is that each successful regex match (and a s/// substitution must do a match — and you're doing four s/// in a row) that is executed "wipes out" all capture variables $1 $2 $3 $n and only re-assigns those corresponding to an actual capture group in the latest successful match.
c:\@Work\Perl\monks>perl -wMstrict -le
"my $s = 'foo bar baz';
$s =~ m{ (foo) \s* (bar) \s* (baz) }xms;
print qq{A: '$1' '$2' '$3'};
;;
$s =~ m{ (xyzzy) }xms;
print qq{B: '$1' '$2' '$3'};
;;
$s =~ m{ (b \w*) }xms;
print qq{C: '$1' '$2' '$3'};
"
A: 'foo' 'bar' 'baz'
B: 'foo' 'bar' 'baz'
Use of uninitialized value $2 in concatenation (.) or string at -e lin
+e 1.
Use of uninitialized value $3 in concatenation (.) or string at -e lin
+e 1.
C: 'bar' '' ''
Update 2: Here's an approach (one of many) to the problem, but without the annoying <STDIN> stuff:
c:\@Work\Perl\monks>perl -wMstrict -le
"my $s = 'Three, Four, One, Two, xFive9';
print qq{'$s'};
;;
my @numbers = $s =~ m{ \b [[:upper:]] [[:lower:]]+ \b }xmsg;
printf qq{'$_' } for @numbers;
print '';
;;
my %correct;
@correct{ @numbers } = qw(one two three four);
;;
my ($rx_search) =
map qr{ \b (?: $_) \b }xms,
join '|',
map quotemeta,
keys %correct
;
print $rx_search;
;;
$s =~ s{ ($rx_search) }{$correct{$1}}xmsg;
print qq{'$s'};
"
'Three, Four, One, Two, xFive9'
'Three' 'Four' 'One' 'Two'
(?^msx: \b (?: Four|Three|Two|One) \b )
'one, two, three, four, xFive9'
| [reply] [d/l] [select] |
|
Thank you for a reply. I'm sorry to say I think I am still a bit too inexperienced in the Perl language to follow the code in your reply fully but I have had a go and tried to answer your questions as fully as I can;
What is my RegExp trying to match? I am trying to match and substitute the words in the string $_ by asking the user to input the correct string of number values. Originally I tried to match and substitute in each 'if' decision after the user input, however doing it this way I could not see a way to match to any string other than the first available without using a string literal.
i.e. $_ = "three, four" I could not see a way to match to 'four' without using the literal, whereas, as I understood it the power of a RegExp came from it being able to find something in a string without a literal constant.
In essence I suppose what I am trying to do is:
Psuedo-
1st substitute/([A-Z][a-z][\W][\b])/<userinput>/;
then
2nd substitute/(NOT THIS ONE[A-Z][a-z][\W][\b])(THIS ONE[A-Z][a-z][\W]
+[\b])/<userinput>/;
then
3rd substitute/(NOT THIS ONE[A-Z][a-z][\W][\b])(AND NOT THIS ONE[A-Z][
+a-z][\W][\b])(BUT THIS ONE[A-Z][a-z][\W][\b])/<userinput>/;
Does that make any kind of sense?
Your first RegEx(A-Za-z.\b) matches nothing because no words in the string are 2-characters in size, adding a plus to the lower case set a-z+ would match the first word, but as I understand it . is capable of matching nothing as well as anything, therefore I beleive it would match nothing and the next character in the match would be a comma when the match is actually looking for a break.
Why does the second regex match something, but only once when you want it to match several times?
I'm unsure about this part so I can't answer this question easily, (?: , | \b)). ? allows the preceeding character to be optional (but there is no preceeding character?) and I can't see the use of a colon in this context. I understand however that the comma is a literal constant to look for, OR a break. '/x' I have not yet come across. Has it only matched once because it is not part of a loop to tell it to match as many times as I want?
I don't want the comma, so perhaps look only for A-Za-z but then how then do I ignore these the second time I want to match? If I must match only once (as I originally had tried to do, then why does Perl not find anything for $2 $3 and $4?
As for Update 1 & 2 I'm afraid they're far beyond my capablities at this moment in time, I realise they're more than likely a cleaner way to write the code, I was simply trying to write a program for myself to show I understood RegExp (but clearly that is not the case!), I'm afraid the code in the two updates are far too advanced for me at this moment ;/
| [reply] [d/l] [select] |
|
... (?: , | \b)). ? allows the preceeding character to be optional (but there is no preceeding character?) and I can't see the use of a colon in this context.
The (?:pattern) construct defines a non-capturing group. See Extended Patterns in perlre. This and other statements in your reply lead me to suggest that you take a big step backwards and read up on basic regex docs. Please see perlre. In particular, see perlretut for a very good tutorial. (I'm not familiar with the material in the Cozens book.) See also perlrequick for a quick reference. See also the material in the "Pattern Matching, Regular Expressions, and Parsing" area of the Tutorials section of this site.
| [reply] [d/l] [select] |
|
|
Keystone,
I am relatively new here at perlmonks, but perhaps I can help a little bit.
You asked why the regexp matched only once, instead of multiple times. This is so because it is a FEATURE of the rules of regex to only do so unless something like the "global" switch is added *at the end of the regex in play*.
If you invoke the global switch, then all matches will be replaced with the substitution string.
So, if you had some code:
$_="FourThreeTwoOne, Three, Four, One, Two";
$1="Three";
$second="&&&&";
s/$2/$second/g;
print;
it would print this result:
Four&&&&TwoOne, &&&&, Four, One, Two
Hope this helps.
-HaT | [reply] |
|
|
|
|
|
|
1st substitute/([A-Z][a-z][\W][\b])/<userinput>/;
The critical thing to remember about [...] character classes is that most regex metacharacters are not meta-special inside them. Thus, [.] (which you have used elsewhere) matches a single '.' (period) character and [\b] matches a single backspace control-character. So the pattern above might be described as:
-
[A-Z] A single upper-case character; followed by
-
[a-z] A single lower-case character; followed by
-
[\W] A single character that is anything not matching a \w 'word' character (\W is a class onto itself, so no enclosing square brackets are needed); followed by
-
[\b] A single backspace control character.
Is any of that what you really wanted?
| [reply] [d/l] [select] |
|
Re: RegExp substitution
by AnomalousMonk (Archbishop) on Apr 11, 2014 at 05:04 UTC
|
use warnings;
use strict;
my $string = 'Three, Four, One, Two, xFive9, six, Seven';
my $number = qr{ \b [[:upper:]] [[:lower:]]+ \b }xms;
print qq{string is now '$string' \n};
my $ordinal = 0;
$string =~ s{ ($number) }
{ ask_replace(++$ordinal, $-[1], $1) }xmsge;
print qq{new string is '$string' \n};
print qq{done! \n};
sub ask_replace {
my ($ordinal,
$offset,
$string,
) = @_;
my $yes = qr{ (?i) y (?: e (?: s)? )? }xmso;
my $ok = qr{ (?i) o (?: k)? }xmso;
my $accept = qr{ \A (?: $yes | $ok) \Z }xmso;
print qq{sub-string $ordinal at offset $offset is '$string' \n};
print qq{is this correct? };
my $answer = <stdin>;
return $string if $answer =~ $accept;
print qq{no: enter new string: };
chomp(my $replace = <stdin>);
return $replace;
}
Output:
c:\@Work\Perl\monks\Keystone>perl ask_to_replace_1.pl
string is now 'Three, Four, One, Two, xFive9, six, Seven'
sub-string 1 at offset 0 is 'Three'
is this correct? n
no: enter new string: Uno
sub-string 2 at offset 7 is 'Four'
is this correct? No
no: enter new string: Dos
sub-string 3 at offset 13 is 'One'
is this correct? y
sub-string 4 at offset 18 is 'Two'
is this correct? x
no: enter new string: Tres
sub-string 5 at offset 36 is 'Seven'
is this correct? n
no: enter new string: se7en
new string is 'Uno, Dos, One, Tres, xFive9, six, se7en'
done!
| [reply] [d/l] [select] |
Re: RegExp substitution
by MidLifeXis (Monsignor) on Apr 10, 2014 at 18:01 UTC
|
| [reply] |
|
| [reply] [d/l] [select] |
|
| [reply] |
|
|