Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Hi Monks,
I have a CGI script that accepts input from a form and processes data accordingly. The processing usually involves parsing through hundreds of megabytes of flat text for certain values and doing some trivial formatting and arithmatic on them. Due to the sheer volume of data that has to be processed, this can take several minutes.
What I would like to do is this: once the user clicks the "Submit" button on the form, he would see a page that says something to the effect of "Processing.....". When the script finally finishes gathering all of it's data, the page would display the results. I'm looking for suggestions on how to accomplish this using perl.
My original though was to simply use fork. One process would create a static HTML file with a page that refreshes every X seconds. The other process goes out and parses the data. When it finishes, it overwrites the static HTML page with the results. Thus, when the user's browser refreshes the page, instead of seeing "Processing..." he would see the results. This technically works, but it also means I have to figure out how to clean up the HTML files once they've been viewed and are no longer needed (which is a bit less than trivial in a multiuser environment in which each user may have several open sessions). A better solution would be to never have to write an HTML file to begin with.
Is there a good way to do this? Thanks for your suggestions.
Re: CGI: Waiting for Output
by aquarium (Curate) on Sep 02, 2003 at 03:04 UTC
|
An alternative is to use multi-part http. there's enough info on this in the CGI module to get you to the finish line. Basically, the form's submit button runs the one script which spits out multi-part html periodically, until the last one, which is the actual results. No reload etc is used, as this is server push. Most browsers support it by now. | [reply] |
|
This sounds a bit more promising, thanks. I'll be sure to explore this. One thing bothers me a bit though:
From
the CGI.pm documentation, Server Push section:
Users interested in server push applications should also have a look at the CGI::Push module.
Only Netscape Navigator supports server push. Internet Explorer browsers do not.
Is this what you were referring to? I've also heard some rumblings that this is highly dependent on the server from which the script is served. Any truth to this? Does IE now work properly? The environment in which my code will be used sees a very wide audience and frequent changes in both browser and server software.
| [reply] |
Re: CGI: Waiting for Output
by rkg (Hermit) on Sep 02, 2003 at 02:04 UTC
|
| [reply] |
|
Hmmm...this seems to be essentially the above-mentioned strategy, which is good but not great. It requires you to write out an html file that then has to be cleaned up later (merlyn suggests using a cron job to periodically delete the files). Is there no way to be a bit more dynamic? E.g. not have to write out static files at all, and hence avoid the hassle of making an intelligent cleaner-upper?
| [reply] |
Re: CGI: Waiting for Output
by Zaxo (Archbishop) on Sep 02, 2003 at 02:08 UTC
|
I think that a more efficient database would be your best speed boost. One of the dbm family might be enough from the way you describe the processing.
Speeding up the slow part is often the best strategy.
After Compline, Zaxo
| [reply] |
Re: CGI: Waiting for Output
by graff (Chancellor) on Sep 02, 2003 at 02:59 UTC
|
If the hundreds of megabytes of flat text are relatively static (only changing once a week, say, or not at all), then the best thing would be to figure out how to structure it into something other than a flat text file -- either a dbm file as suggested by Zaxo, or a relational database.
If you're parsing through something like a log file that changes (grows) continuously, you could still consider creating "digests" at regular intervals that would not be flat text (i.e. searching/indexing would be relatively quick and properly focused from the user's perspective), so that a separate cron job handles the heavy lifting, and produces a derived data set that allows the browser interaction to rely on a simple/quick task.
In any case, having each browser connection produce its own distinct summary web page shouldn't be a problem -- just apply a file name convention that makes it easy for yesterday's html files to be deleted at noon or whatever. | [reply] |
|
This is a good strategy, but not one that gets at the core of the problem. Let's say my script isn't parsing hundreds of megabytes of flat text (that's out of my control unfortunately, or else I would put it in a db as Zaxo suggests). Maybe instead it's performing some huge computation or doing some massive en/decryption or waiting for a reply from another server somewhere. Is there a way to let the user see a message to the effect of "I'm working on it" and then see the result as soon as it's done? Or is this too dynamic? Is there no way to be free of reliance on cron or some other cleanup entity? Must I actually write out a file?
| [reply] |
Re: CGI: Waiting for Output
by benn (Vicar) on Sep 02, 2003 at 11:08 UTC
|
Another way may be to use a redirected 'holding pattern' script that checks a semaphore set by your main program. You could even show updates - number of files processed etc.
If your main script creates a temp file that it periodically writes a status to, and deleted when finished, it could pass this filename to "holding.cgi", which can then periodically display the status, then the results when finished - something like this...
#Main File
use File::Temp;
my ($fh, $filename) = tempfile( DIR => '/tmp');
my ($hfh, $results) = tempfile( DIR => '/html');
print redirect("holding.cgi?file=$filename&html=$results");
do {
# process record
print $fh $num_records,"\n" unless ($num_processed++ % 1000);
} while($records_not_processed);
# write results to $hfh
# ...or just results.htm...
# ...if you haven't got multiple users / datasets etc.
unlink $filename;
#Holding.cgi
my $filename=param('file');
my $html=param('html');
if (-e $filename) {
open FILE $filename;
# read in last number
# print some HTML...
# ...that's got a META refresh in...
# ...that points back to ourself
close FILE;
}
else {print redirect($html);}
Cheers, Ben. | [reply] [d/l] |
|
This works, but is a problem for sites that attempt to provide accessible content. A META refresh may disorient users, since the content may change during mid-viewing without any user interaction. Check out the W3C's Core Techniques for Web Content Accessibility Guidelines for more information.
| [reply] |
|
|