Re: More robust link finding than HTML::LinkExtor/HTML::Parser?

Replies are listed 'Best First'.
Re^2: More robust link finding than HTML::LinkExtor/HTML::Parser? by Allasso (Monk) on May 08, 2011 at 10:47 UTC
HTML::LinkExtor / HTML::Parser are robust. They do a different job, but they are robust. Implying they aren't robust is poor form. Yes, I agree. I was not mindful of the wording of my question.	[reply]
Re^2: More robust link finding than HTML::LinkExtor/HTML::Parser? by Allasso (Monk) on May 08, 2011 at 11:33 UTC
Thank you for the links. I wish to have a script that works independently of a browser. So I don't think WWW::Mechanize::Firefox will work for me, unless you were seeing a way that I could utilize this to come up with code for a script that works independently of Firefox. If so, please let me know. The second link looks more promising, now I just have to try to figure out what Mozilla is doing here :-) I believe that HTML::LinkExtor will work fine for extracting the links in the HTML robustly :-); I just need now to find a way to extract them from CSS and JS.	[reply]
Re^3: More robust link finding than HTML::LinkExtor/HTML::Parser? by Anonymous Monk on May 08, 2011 at 12:17 UTC
The second link looks more promising, now I just have to try to figure out what Mozilla is doing here :-) The second link is for use with WWW::Mechanize::Firefox. You need some kind of browser, something to interpret the javascript, there is no way around that. The other candidate is WWW::Scripter, a WWW::Mechanize subclass, but its alpha version, and my simple test didn't yield anything useful, :) My other thought was go straight for the supporting module CSS::DOM, but that didn't work out. Same goes for CSS/CSS::SAC/CSS::Tiny. I figure this ought to be robust enough for css `## http://cpansearch.perl.org/src/NEVESENIN/CSS-Packer-1.000001/lib/CS +S/Packer.pm our $DICTIONARY = { 'STRING1' => qr~"(?>(?:(?>[^"\\]+)\|\\.\|\\"\|\\\s))"~, 'STRING2' => qr~'(?>(?:(?>[^'\\]+)\|\\.\|\\'\|\\\s))'~ }; our $URL = 'url$\s(' . $DICTIONARY->{STRING1} . '\|' . $DI +CTIONARY->{STRING2} . '\|[^\'"\s]+?)\s$'; our $IMPORT = '\@import\s+(' . $DICTIONARY->{STRING1} . '\|' . +$DICTIONARY->{STRING2} . '\|' . $URL . ')([^;]*);';` [download]	[reply] [d/l]


The stupid question is the question not asked
	PerlMonks