Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Re: This looks like someone sneezed and hit the keyboard

by Theo (Priest)
on Feb 03, 2004 at 15:41 UTC ( [id://326214]=note: print w/replies, xml ) Need Help??


in reply to Re: This looks like someone sneezed and hit the keyboard
in thread This looks like someone sneezed and hit the keyboard

Okay, I guess this is a newbie question ... It looks to me like the /.* opening to the regex is greedy and would grab everything that was applied to it leaving nothing for the rest of the regex to match to. In otherwords, anything/everything would give a match.

Other, wiser monks have not mentioned this, so I'm assuming I've missed something. Why isn't my assumption true?

Update: Thanks to bunnyman, ysth and MCS for their gentle instruction.

-Theo-
(so many nodes and so little time ... )

Replies are listed 'Best First'.
Re: Re: Re: This looks like someone sneezed and hit the keyboard
by bunnyman (Hermit) on Feb 03, 2004 at 16:33 UTC

    No, everything in the regex must match, not just the first part of it, and the part in the middle with the (one|two|three) must match too.

    The thing that you must remember is that regexes can backtrack -- if they get to the end of the string without having matched yet, they can go back a few letters and try again.

    So the .* part will first try to match the entire string, because it is greedy. Then the middle part (one|two|three) must match, but there is nothing left in the string, and we must backtrack and try again. First we try going one letter back, then two, and eventually we either find the match or we backtrack all the way to the start and then there is no match.

Re: Re: Re: This looks like someone sneezed and hit the keyboard
by MCS (Monk) on Feb 03, 2004 at 17:05 UTC

    The reason that most people say you shouldn't use .* is because it can match nothing (or everything) so matching just .* is pointless because it will match everything (including nothing) However, if you were looking for "hi" some ammount of text and then "there" you could use:

    $line =~ /hi.*there/;

    and it would match. Of course it's greedy and might not be exactly what you wanted but there are times when it is needed. However, it is overused a lot and usually something better can be used.

    To answer your question though, /.* doesn't grab everything because it has required stuff after that. If you try and match /.*some text/ It has to find "some text" or it will fail. However, if you try and match something like: /.*\d?/ it could match nothing since the \d is optional.

Re: Re: Re: This looks like someone sneezed and hit the keyboard
by ysth (Canon) on Feb 03, 2004 at 16:59 UTC
    Because for the match to succeed, one of the three (optA|optB|optC) options has to match. With the .* at the front, it will basically start at the end of the string and work backwards until it finds one of the alternates.

    The \s? at the end is useless though (unless $& is used).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://326214]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (7)
As of 2024-04-25 15:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found