Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

SOLVED:Regular expression fetching multiple matches

by ansh batra (Friar)
on Jan 06, 2013 at 10:49 UTC ( [id://1011874]=perlquestion: print w/replies, xml ) Need Help??

ansh batra has asked for the wisdom of the Perl Monks concerning the following question:

hello monks
i am trying to fetch multiple matches

$string="<ul><li><div id='Show1400'><a href='../../../KeyboardKeys/Ret +ainerClip/Acer/Aspire/1400'>1400</a></div></li><li><div id='Show1410' +><a href='../../../KeyboardKeys/RetainerClip/Acer/Aspire/1410'>1410</ +a></div></li>"; my @ari=$line =~ /<a href=\'(.*?)\'>.*<\/a>/g;
i am getting only the first match i.e. ../../../KeyboardKeys/RetainerClip/Acer/Aspire/1400
please help

Replies are listed 'Best First'.
Re: Regular expression fetching multiple matches
by dave_the_m (Monsignor) on Jan 06, 2013 at 11:17 UTC
    Replace the second .* with .*? . The greedy match is consuming both anchors.

    Dave.

      working !! thanks :)

Re: SOLVED:Regular expression fetching multiple matches
by 7stud (Deacon) on Jan 09, 2013 at 08:16 UTC

    Perl has so many ways of quoting things, you never have to escape characters.

    1) Use the here doc syntax for your string, which will get rid of the outer double quote marks--allowing you to use double quote marks inside your string.

    2) Use qr{ } around your whole regex, which will get rid of the beginning and ending forward slash marks(/)--which will allow you to use forward slash marks inside your regex.

    3) Inside a regex, a single quote mark has no special meaning (same for a double quote mark), so you don't have to escape it.

    4) It's usually more efficient to use:

    [^some_char]some_char

    than:

    .*?some_char

    Here is an example:

    use strict; use warnings; use 5.012; my $string = <<END_OF_HTML; <ul> <li> <div id="Show1400"> <a href="../../../KeyboardKeys/RetainerClip/Acer/Aspire/1400">14 +00</a> </div> </li> <li> <div id="Show1410"> <a href="../../../KeyboardKeys/RetainerClip/Acer/Aspire/1410">14 +10</a> </div> </li> </ul> END_OF_HTML my $regex = qr{<a href=["']([^"']+)["']>[^<]+</a>}; while ($string =~ /$regex/g) { say $1; } --output:-- ../../../KeyboardKeys/RetainerClip/Acer/Aspire/1400 ../../../KeyboardKeys/RetainerClip/Acer/Aspire/1410
      thanks
Re: SOLVED:Regular expression fetching multiple matches
by 7stud (Deacon) on Jan 14, 2013 at 03:02 UTC

    I forgot to mention something else:

    5) Don't try to parse html with your own regexes. There are plenty of perl HTML parsers that do the hard work for you. Here is one example:

    use strict; use warnings; use 5.012; use HTML::TreeBuilder; my $string = <<END_OF_HTML; <ul> <li> <div id="Show1400"> <a href="../../../KeyboardKeys/RetainerClip/Acer/Aspire/1400">14 +00</a> </div> </li> <li> <div id="Show1410"> <a href="../../../KeyboardKeys/RetainerClip/Acer/Aspire/1410">14 +10</a> </div> </li> END_OF_HTML #Note the missing </ul> tag above. my $tree = HTML::TreeBuilder->new(); $tree->parse_content($string); my @links = $tree->look_down('_tag', 'a'); for my $link (@links) { say $link->attr('href'); } --output:-- ../../../KeyboardKeys/RetainerClip/Acer/Aspire/1400 ../../../KeyboardKeys/RetainerClip/Acer/Aspire/1410

    The docs on how to use HTML::TreeBuilder are at HTML::Tree::Scanning.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1011874]
Approved by philipbailey
Front-paged by philipbailey
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (7)
As of 2024-04-25 15:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found