Downloading continous updates from webpage

avid has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Downloading continous updates from webpage by vladb (Vicar) on Feb 16, 2006 at 03:02 UTC
I'm not clear, are you trying to build a page that would display download progress in such fashion.. or are you instead trying to download a large file, for example? There are many tools out there to aid in downloading large files off the web. If you are using the Firefox browser, you may find some of the download extensions useful as well. But if you are trying to build a script to fetch files the Bundle::LWP module could help as is also explained in this post. Whereas this post also explains how to download multiple files at once. _____________________ "We've all heard that a million monkeys banging on a million typewriters will eventually reproduce the entire works of Shakespeare. Now, thanks to the Internet, we know this is not true." Robert Wilensky, University of California	[reply]
Re: Downloading continous updates from webpage by BrowserUk (Patriarch) on Feb 16, 2006 at 03:42 UTC
When you say "incremental updates", does each refresh contain all the preceeding information? If so, you probably only need the final page, which from your description should be easy to detect because of the presence of summary information. Presumably the intermediate pages displayed in the browser are fetched as a result of a meta refresh tag or javascript refresh every few minutes? When automated, you wouldn't need the autorefreshes as you are only going to discard them, but it may be necessary to fetch them anyway as the server may decide to cancel the processing if it doesn't see a refresh request at regular intervals. Depending upon the complexity of the page and the refresh mechanism used, you might get away with using LWP::Simple to `get` or `put` the url successively (at appropriately timed intervals), scanning the content returned and discarding it until it contains the summary information. In more complex cases, you may need to scan the content returned by the first submit and extract the refresh url from embedded javascript. It may even be necessary to rescan every partial content returned page to extract a different url. It might be easier to use WWW::Mechanize, though I'm not sure that it copes with embedded javascript refreshes? Providing a code example is pretty much impossible without seeing the pages involved. If the url is public, you could post it, (or /msg it to a willing responder if you don't want to overtax the server), and you might get a worked example. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^2: Downloading continous updates from webpage by acid06 (Friar) on Feb 16, 2006 at 14:20 UTC
When you say "incremental updates", does each refresh contain all the preceeding information? From what the poster said, I think there's no refreshing of the page at all. I think the server is just printing stuff and the browser renders what it cans before the whole page is done downloading. This works kind of well is some scenarios and even better if you turn on autoflush on the server side. However there are some catches. E.g. AFAIK, IE will only render a table after it gets the closing tag. And possibly some more of these kind of glitches. acid06 perl -e "print pack('h*', 16369646), scalar reverse $="	[reply]
Re: Downloading continous updates from webpage by Ultra (Hermit) on Feb 16, 2006 at 06:45 UTC
By incremental updates you mean that your HTTP server is accepting Range header? --> so that you can ask for pieces of data Dodge This!	[reply]
Re^2: Downloading continous updates from webpage by avid (Novice) on Feb 17, 2006 at 17:03 UTC
Thanks to you all for this prompt responses. >>>>>>browser renders what it cans before the whole page is done downloading. I guess this is the case. I cannot do multiple reads to get incremental updates, as the post also contains input data that will get resubmitted. Is there any timeout in LWP POSTs? If none, I can just do a POST and then check results, the script can just wait whatever time it takes server to calculate. I will be reading the WWW:Mechanize man pages and check if it can solve my problem.	[reply]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks