Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: More robust link finding than HTML::LinkExtor/HTML::Parser?

by ww (Archbishop)
on May 07, 2011 at 21:56 UTC ( [id://903595]=note: print w/replies, xml ) Need Help??


in reply to More robust link finding than HTML::LinkExtor/HTML::Parser?

H::LE and H:P skipped what you seem to be calling a link in the code beginning at line 8 because it isn't an HTML link.

Line 8 is a declaration that what follows -- until the final </script> -- is to be handled by javascript.

As to the first two, I tend to lean to Corion's view: they're being handled with css (initially); neither is a simple HTML link... which (without exception that I can think of OTTOMH) implies that the address/filename will be where I have elipsis in a construct like:

<a href="...">rendered_link_Label_here</a>
    or an
<img src="address...filename.typ">
    or similar.

Perhaps you should explore for modules which will chase down css and js... or perhaps, depending on your actual goal, you don't need to worry about the stylesheet or flash sources, etc.

Replies are listed 'Best First'.
Re^2: More robust link finding than HTML::LinkExtor/HTML::Parser?
by Allasso (Monk) on May 08, 2011 at 00:09 UTC
    ...because it isn't an HTML link.

    I just called it a "link", meaning something that links to another file. Is there is a more appropriate name to call it when it appears in something other than an HTML tag?
      I think the problem is context. Yes, you called it simply "a link" but you did so in the context of purported failures by two HTML-oriented modules.

      Just as you probably wouldn't want to use a fishing net to dig potatoes, the links for which those modules fish are HTML links; rooting around in javascript or styling links with CSS requires a different tool.

      I am unaware of any alternate name or word; I think the solution is to be cautious on your context.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://903595]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2024-04-26 08:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found