Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Improving my regex skills and a few questions.

by BrowserUk (Patriarch)
on Aug 27, 2002 at 01:07 UTC ( [id://193058]=perlquestion: print w/replies, xml ) Need Help??

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

Since I got bitten with a couple of my public attempts at regex construction, I decided to re-read the pod, japhy's book and take a (private) crack at any regex Q's that came up here in an attempt to improve my skills.

Whats below was started by this [untitled node, ID 192753] SoPW, and I am posting for two reasons.

  1. I would like feedback on my mechanism for deriving my regex. The idea being to do a little as necessary (laziness) and to use as few wildcard components as possible for best performance.
  2. There are half a dozen questions in the comments (look for the ?????) that I would request explanations or pointers on.

#! perl -w use strict; my @site; push @site, "<!-- USER $_ - donkey_pusher_$_ -->" for (1..10); =pod ### Start with a typical sample /<!-- USER 20 - donkey_pusher_6 -->/ ### Escape anything that might cause a problem (Nothing to do here!) /<!-- USER 20 - donkey_pusher_6 -->/ ### Add some anchors if I KNOW that they are true /^<!-- USER 20 - donkey_pusher_6 -->$/ ### Bracket the bit(s) I want to keep. /^<!-- USER 20 - (donkey_pusher_6) -->$/ ### Substitute appropriate wildcards (Right term?) for the bits I know + will change ### The start/end of html comments have to be fixed pretty much. ### The whitespace could vary, but the /x modifier ??should?? handle t +hat nicely /^<!-- USER \d+ - (\w+) -->$/ ### Add any modifiers that might help. ### /x so each space will match any number or combination of whitespac +e. ### /o to compile for speed, ??not clear to me if this is necessary o +r advantagous ### if there are no variables to be interpolated in the regex?? ### /i incase "USER" might vary in case. ### If its not needed, don't! I think it is probably expensive. /^<!-- USER \d+ - (\w+) -->$/xio =cut # qr// can have some speed benefits, ??when?? my $regex1 = qr/^<!-- USER \d+ - (\w+) -->$/iox; # I had to remove the /x else it didn't match SOMETIMES???????? my $regex2 = qr/^<!-- USER \d+ - (\w+) -->$/io; for (@site ) { ## Standalone with /x on qr// NEVER matches ????????? # next unless $_ =~ $regex1; # This FAILS to match also ??????????? # next unless m/$regex1/; # Adding the /x to $regex2 this way works ok # next unless m/$regex2/x; next unless $_ =~ $regex2; # And this works. my $userid = $1; print $userid,$/; } # use the map trick # using $regex1 with the /x modifier WORKS ok here!! my @users = map { $regex1 ? $1 : () } @site; my $doc=join "\n", @site; # Simulating slurp mode here!!! study $doc; ## Could give big boost on long strings? # Again, adding the /x modifier here.... my $regex3 = qr/^<!-- USER \d+ - (\w+) -->$/oi; #my @users2 = $doc =~ /$regex3/mgc; # means this FAILS! ?? my @users2 = $doc =~ /$regex3/xmgc; # Adding here it works. print 'results:', ~~@users, ' ', ~~@users2, $/;

What's this about a "crooked mitre"? I'm good at woodwork!

Replies are listed 'Best First'.
Re: Improving my regex skills and a few questions.
by Django (Pilgrim) on Aug 27, 2002 at 01:29 UTC
    You've mistaken the x modifier: Memorize "eXpressive", meaning that you can use whitespace and even comments within a regex without impact on the pattern.
    / f # this matches "foo" o # and nothing else o /x

      Thanks Django! I actually saw your post last night (or rather earlier this morning my time) when you posted it. It immediately explained all my /x problems... instead of reading the description, I'd been blithely scanned over it thinking I knew what it meant.

      It also made me realise that I shouldn't try and code when I'm dog tired, and went to bed embarassed. Is that an an excuse? Probably.

      Now I have read it, I see exactly why it doesn't and couldn't work that way. Thanks.

      Ah well. Probably gave more people a good laugh than some of my puns:)


      What's this about a "crooked mitre"? I'm good at woodwork!
Re: Improving my regex skills and a few questions.
by schumi (Hermit) on Aug 27, 2002 at 07:10 UTC
    Good morrow!

    I like your way of generating your regex. While some of us might be quicker to just think a minute, and then write down a fully working regex, your step-by-step way makes sure you don't miss anything. *makes mental note to keep this method in mind*

    Using the /o modifier may indeed make for speed, as it allows only one compilation. The variables within your pattern will be interpolated (unless your delimiters are single quotes), and thus your pattern may be recompiled, whenever the pattern operator is evaluated. The /o modifier prevents this recompilation and thus may save time.

    You are using qr// with your regex, and then use it standalone-ingly to match against $_. That won't work, because qr//, while specifying a pattern, does not match against anything. Instead, the regex is compiled and returned for future use.

    The reason why your $regex1 doesn't match probably lies in what Django stated above. I can't really say, however, why placing your /x at different places would make a difference...

    Hope this helps...

    --cs

    There are nights when the wolves are silent and only the moon howls. - George Carlin

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://193058]
Approved by Ovid
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2024-04-26 02:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found