HTML::LinkExtor / HTML::Parser are robust. They do a different job, but they are robust. Implying they aren't robust is poor form.
Yes, I agree. I was not mindful of the wording of my question. | [reply] |
Thank you for the links.
I wish to have a script that works independently of a browser. So I don't think WWW::Mechanize::Firefox will work for me, unless you were seeing a way that I could utilize this to come up with code for a script that works independently of Firefox. If so, please let me know.
The second link looks more promising, now I just have to try to figure out what Mozilla is doing here :-)
I believe that HTML::LinkExtor will work fine for extracting the links in the HTML robustly :-); I just need now to find a way to extract them from CSS and JS. | [reply] |
The second link looks more promising, now I just have to try to figure out what Mozilla is doing here :-)
The second link is for use with WWW::Mechanize::Firefox.
You need some kind of browser, something to interpret the javascript, there is no way around that.
The other candidate is WWW::Scripter, a WWW::Mechanize subclass, but its alpha version,
and my simple test didn't yield anything useful, :) My other thought was go straight for the supporting module CSS::DOM, but that didn't work out. Same goes for CSS/CSS::SAC/CSS::Tiny.
I figure this ought to be robust enough for css
## http://cpansearch.perl.org/src/NEVESENIN/CSS-Packer-1.000001/lib/CS
+S/Packer.pm
our $DICTIONARY = {
'STRING1' => qr~"(?>(?:(?>[^"\\]+)|\\.|\\"|\\\s)*)"~,
'STRING2' => qr~'(?>(?:(?>[^'\\]+)|\\.|\\'|\\\s)*)'~
};
our $URL = 'url\(\s*(' . $DICTIONARY->{STRING1} . '|' . $DI
+CTIONARY->{STRING2} . '|[^\'"\s]+?)\s*\)';
our $IMPORT = '\@import\s+(' . $DICTIONARY->{STRING1} . '|' .
+$DICTIONARY->{STRING2} . '|' . $URL . ')([^;]*);';
| [reply] [d/l] |