Since I got bitten with a couple of my public attempts at regex construction, I decided to re-read the pod, japhy's book and take a (private) crack at any regex Q's that came up here in an attempt to improve my skills.
Whats below was started by this [untitled node, ID 192753] SoPW, and I am posting for two reasons.
- I would like feedback on my mechanism for deriving my regex. The idea being to do a little as necessary (laziness) and to use as few wildcard components as possible for best performance.
- There are half a dozen questions in the comments (look for the ?????) that I would request explanations or pointers on.
#! perl -w
use strict;
my @site;
push @site, "<!-- USER $_ - donkey_pusher_$_ -->" for (1..10);
=pod
### Start with a typical sample
/<!-- USER 20 - donkey_pusher_6 -->/
### Escape anything that might cause a problem (Nothing to do here!)
/<!-- USER 20 - donkey_pusher_6 -->/
### Add some anchors if I KNOW that they are true
/^<!-- USER 20 - donkey_pusher_6 -->$/
### Bracket the bit(s) I want to keep.
/^<!-- USER 20 - (donkey_pusher_6) -->$/
### Substitute appropriate wildcards (Right term?) for the bits I know
+ will change
### The start/end of html comments have to be fixed pretty much.
### The whitespace could vary, but the /x modifier ??should?? handle t
+hat nicely
/^<!-- USER \d+ - (\w+) -->$/
### Add any modifiers that might help.
### /x so each space will match any number or combination of whitespac
+e.
### /o to compile for speed, ??not clear to me if this is necessary o
+r advantagous
### if there are no variables to be interpolated in the regex??
### /i incase "USER" might vary in case.
### If its not needed, don't! I think it is probably expensive.
/^<!-- USER \d+ - (\w+) -->$/xio
=cut
# qr// can have some speed benefits, ??when??
my $regex1 = qr/^<!-- USER \d+ - (\w+) -->$/iox;
# I had to remove the /x else it didn't match SOMETIMES????????
my $regex2 = qr/^<!-- USER \d+ - (\w+) -->$/io;
for (@site ) {
## Standalone with /x on qr// NEVER matches ?????????
# next unless $_ =~ $regex1;
# This FAILS to match also ???????????
# next unless m/$regex1/;
# Adding the /x to $regex2 this way works ok
# next unless m/$regex2/x;
next unless $_ =~ $regex2; # And this works.
my $userid = $1;
print $userid,$/;
}
# use the map trick
# using $regex1 with the /x modifier WORKS ok here!!
my @users = map { $regex1 ? $1 : () } @site;
my $doc=join "\n", @site; # Simulating slurp mode here!!!
study $doc; ## Could give big boost on long strings?
# Again, adding the /x modifier here....
my $regex3 = qr/^<!-- USER \d+ - (\w+) -->$/oi;
#my @users2 = $doc =~ /$regex3/mgc; # means this FAILS! ??
my @users2 = $doc =~ /$regex3/xmgc; # Adding here it works.
print 'results:', ~~@users, ' ', ~~@users2, $/;
What's this about a "crooked mitre"? I'm good at woodwork!