Web Spidering Ajax Sites

awohld has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Web Spidering Ajax Sites by perrin (Chancellor) on Sep 14, 2007 at 02:27 UTC
Ajax is just JavaScript. If all you want to do is run a certain sequence of requests and capture the results, HTTP::Recorder and WWW::Mechanize will work fine. Just set up the proxy script that comes with HTTP::Recorder, make the requests in your browser, and Recorder will turn that into a Mechanize script that behaves the same as the Ajax code.	[reply]
Re^2: Web Spidering Ajax Sites by sgt (Deacon) on Sep 14, 2007 at 22:10 UTC
well maybe the faqis outdated but mech's author says it does not play well will javascript (as it does not have an engine). Are you saying the situation has changed? cheers --stephan	[reply]
Re^3: Web Spidering Ajax Sites by perrin (Chancellor) on Sep 14, 2007 at 23:49 UTC
That information is correct, but totally irrelevant. Mech has no support for JavaScript, but the server doesn't know that. If you wanted to actually execute some JavaScript code, Mech can't do it, but all you want to do is talk to the server as if you were a browser (with JavaScript), and Mech can do that. There is nothing that JavaScript can make a browser send to the server that you can't mimic with Mech. The only hard part is figuring out exactly what the JavaScript would send, and using HTTP::Recorder with your browser (or using some other means of looking at the requests, like LiveHTTPHeaders) solves that for you.	[reply]
Re^4: Web Spidering Ajax Sites by sgt (Deacon) on Sep 18, 2007 at 13:26 UTC
Re^5: Web Spidering Ajax Sites by perrin (Chancellor) on Sep 19, 2007 at 03:40 UTC
Re: Web Spidering Ajax Sites by Gangabass (Vicar) on Sep 14, 2007 at 02:54 UTC
AJAX is just some request to server from JavaScript code so you can do this from Perl as well. But first you must realize which request do you need. I use for that FireFox LiveHTTPHeaders Extension.	[reply]
Re: Web Spidering Ajax Sites by erroneousBollock (Curate) on Sep 14, 2007 at 05:35 UTC
Spidering in the classic sense? No, not without having your spidering code magically figure out what the Javascript might do. As others have said, you (the programmer) can figure out what the Javascript does (or record it with a proxy) and then have your perl code do that. -David	[reply]
Re: Web Spidering Ajax Sites by Joost (Canon) on Sep 15, 2007 at 00:13 UTC
Selenium works by running the html/code through a javascript enabled browser. Your question seems to be: is it possible to emulate a javascript-enabled browser in 100% pure perl, then the answer is yes. The catch is that no-one has written anything even close to doing that. Even given working HTTP/WWW and JavaScript libraries it's very far from trivial to cook up a working/scriptable DOM model that can be used from the JavaScript code and is compatible with most current websites, or even a small subset of most websites. And I've tried. :-) "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply]
Re: Web Spidering Ajax Sites by runrig (Abbot) on Sep 15, 2007 at 00:20 UTC
Selenium would be slower than the equivalent WWW::Mech solution, so if you can look at the JavaScript and figure out what requests are actually being sent, it might be worth it (I have done it on some web sites). But Selenium would probably be easier to deal with if figuring out the JavaScript is hard (I actually started to use WET and WATIR, but gave up after I figured out the JavaScript).	[reply]


Don't ask to ask, just ask
	PerlMonks