Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: REGEX for url

by graff (Chancellor)
on Apr 25, 2016 at 21:44 UTC ( [id://1161484]=note: print w/replies, xml ) Need Help??


in reply to REGEX for url

It looks like you're just trying to extract values of href= attributes from anchor tags (i.e. the "..." from <a href="...">) in html data.

I'm surprised that no one yet has mentioned that there are CPAN modules for doing exactly that - e.g. HTML::LinkExtor, among others. (I haven't had occasion to use them myself. but to do what you're doing, I'd start with one of those.)

Replies are listed 'Best First'.
Re^2: REGEX for url
by wrkrbeee (Scribe) on Apr 25, 2016 at 21:46 UTC
    You are exactly right, extract data between anchor tags. I will try the CPAN module you mentioned. Thank you!!
      Having looked a little more at the CPAN search results, I find it odd that the man page for HTML::LinkExtor appears to be shorter and simpler than the one for HTML::SimpleLinkExtor -- I'm not sure what "Simple" is supposed to refer to in the latter module.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1161484]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (2)
As of 2024-04-24 23:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found