Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^4: REGEX for url

by wrkrbeee (Scribe)
on Apr 25, 2016 at 21:09 UTC ( [id://1161481]=note: print w/replies, xml ) Need Help??


in reply to Re^3: REGEX for url
in thread REGEX for url

Any possibilities for why that would not work on my end? Maybe something that a rookie would do that an expert would not, or vice versa? Thank you for your time!

Replies are listed 'Best First'.
Re^5: REGEX for url
by NetWallah (Canon) on Apr 25, 2016 at 21:19 UTC
    This version is a little more robust - it works in both cases - with or without setting "$/".

    It can also handle multiple URL's.

    use strict; use warnings; $/="</html>"; for(<DATA>){ print"$1\n" while /a href="(.*)"/g; } __DATA__ <td scope="row">9</td> <td scope="row">SUBSIDIARIES OF THE REGISTRANT</td> <td scope="row"><a href="/Archives/edgar/data/1050122/0000 +92735601000365/0000927356-01-00­0365- 0009.txt">0009.txt</a></­td> <td scope="row">EX-21.1</td> <td scope="row"><a href="/Another/URL/here.html">0009.txt</a></­td>

            This is not an optical illusion, it just looks like one.

Re^5: REGEX for url
by ExReg (Priest) on Apr 25, 2016 at 22:07 UTC

    Not able to check it on my machine, but wouldn't a /s be helpful here to be able to pass over the newlines?

    print if s/.*a href="(.*)".*/$1/s;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1161481]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2024-04-25 05:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found