Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: lovely regexs

by gmargo (Hermit)
on Apr 12, 2009 at 00:41 UTC ( [id://757069]=note: print w/replies, xml ) Need Help??


in reply to lovely regexs

Perhaps change the regular expression to turn off the default "greediness" of the "*", with a "?" quantifier, so that it gathers only up to the next quote character.
$moo =~ m/src="(.*?)"/;
However, I normally use HTML::TreeBuilder to parse and search html.

Replies are listed 'Best First'.
Re^2: lovely regexs
by dsheroh (Monsignor) on Apr 12, 2009 at 22:20 UTC
    Ignoring, for the moment, the wisdom of using the proper tool (which is generally not a regex) for parsing HTML...

    The issue here is not greediness. The issue is the misuse of ".*". Making the "*" non-greedy is just a band-aid which masks the fact that ".*" says "match any number of any characters", when what you actually mean is "match any number of any non-double quote characters". The correct way to write that regex is:

    $moo =~ m/src="([^"]*)"/;

    The non-greedy qualifier does have its legitimate uses, generally in cases where your target is terminated by a sequence of multiple characters. In cases where a negated character class can do the job, though, the character class will almost always be the better option.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://757069]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2024-04-24 00:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found