http://qs321.pair.com?node_id=981114


in reply to Timing web page download.

Based on your requirements of getting all resources for a page that can be created in JavaScript, I'd use something that handles JavaScript well on a headless server: Node.js. That engine has the capabilities you need, and NPM means someone has already written what you seek (especially since Node is used for testing of JavaScript web apps). However, your alternative processor types means you'll have to figure out if you can compile it from source instead of using a Windows, or Mac OS binary.

Replies are listed 'Best First'.
Re^2: Timing web page download.
by Eyck (Priest) on Jul 12, 2012 at 12:23 UTC

    Thanks for suggestion, but can you point out the NPM you have on mind that does something similiar?

    I'm thinking that if I'm to parse web page, then it doesn't matter if I write the parser in perl,c or js, in fact it would be harder to do in js, unless you're suggesting compiling it with nodejs and then running that foreign code in server context.

    In what way is JS better for parsing html then perl?

      The most likely candidate NPM seems like it might be jscrape, which combines jsdom, request, and jquery. The reason I recommended Javascript / Node as an option is your own language:

      This works more-or-less the way I intended, there are two problems though - since the list of links is dynamic, and partly created using javascript, I had to use the browser to create that list.

      I need a way of parsing web page, and getting a list of all its component, and this is my first problem.

      If you are dealing with pages that use Javascript to dynamically load resources, then you have to have something that can interpret that Javascript as a browser would.

      As something completely different, you might want to check out Selenium.