Re^2: More robust link finding than HTML::LinkExtor/HTML::Parser?

Thank you for the links.

I wish to have a script that works independently of a browser. So I don't think WWW::Mechanize::Firefox will work for me, unless you were seeing a way that I could utilize this to come up with code for a script that works independently of Firefox. If so, please let me know.

The second link looks more promising, now I just have to try to figure out what Mozilla is doing here :-)

I believe that HTML::LinkExtor will work fine for extracting the links in the HTML robustly :-); I just need now to find a way to extract them from CSS and JS.

Comment on Re^2: More robust link finding than HTML::LinkExtor/HTML::Parser?

Replies are listed 'Best First'.

Re^3: More robust link finding than HTML::LinkExtor/HTML::Parser?
by Anonymous Monk on May 08, 2011 at 12:17 UTC

The second link looks more promising, now I just have to try to figure out what Mozilla is doing here :-)

The second link is for use with WWW::Mechanize::Firefox.

You need some kind of browser, something to interpret the javascript, there is no way around that.

The other candidate is WWW::Scripter, a WWW::Mechanize subclass, but its alpha version, and my simple test didn't yield anything useful, :)

My other thought was go straight for the supporting module CSS::DOM, but that didn't work out. Same goes for CSS/CSS::SAC/CSS::Tiny.

I figure this ought to be robust enough for css

## http://cpansearch.perl.org/src/NEVESENIN/CSS-Packer-1.000001/lib/CS
+S/Packer.pm
our $DICTIONARY     = {
    'STRING1'   => qr~"(?>(?:(?>[^"\\]+)|\\.|\\"|\\\s)*)"~,
    'STRING2'   => qr~'(?>(?:(?>[^'\\]+)|\\.|\\'|\\\s)*)'~
};
our $URL            = 'url\(\s*(' . $DICTIONARY->{STRING1} . '|' . $DI
+CTIONARY->{STRING2} . '|[^\'"\s]+?)\s*\)';
our $IMPORT         = '\@import\s+(' . $DICTIONARY->{STRING1} . '|' . 
+$DICTIONARY->{STRING2} . '|' . $URL . ')([^;]*);';
[download]

[reply]
[d/l]


XP is just a number
	PerlMonks