Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

CGI: Waiting for Output

by Anonymous Monk
on Sep 02, 2003 at 01:59 UTC ( [id://288222]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have a CGI script that accepts input from a form and processes data accordingly. The processing usually involves parsing through hundreds of megabytes of flat text for certain values and doing some trivial formatting and arithmatic on them. Due to the sheer volume of data that has to be processed, this can take several minutes.

What I would like to do is this: once the user clicks the "Submit" button on the form, he would see a page that says something to the effect of "Processing.....". When the script finally finishes gathering all of it's data, the page would display the results. I'm looking for suggestions on how to accomplish this using perl.

My original though was to simply use fork. One process would create a static HTML file with a page that refreshes every X seconds. The other process goes out and parses the data. When it finishes, it overwrites the static HTML page with the results. Thus, when the user's browser refreshes the page, instead of seeing "Processing..." he would see the results. This technically works, but it also means I have to figure out how to clean up the HTML files once they've been viewed and are no longer needed (which is a bit less than trivial in a multiuser environment in which each user may have several open sessions). A better solution would be to never have to write an HTML file to begin with.

Is there a good way to do this? Thanks for your suggestions.

Replies are listed 'Best First'.
Re: CGI: Waiting for Output
by aquarium (Curate) on Sep 02, 2003 at 03:04 UTC
    An alternative is to use multi-part http. there's enough info on this in the CGI module to get you to the finish line. Basically, the form's submit button runs the one script which spits out multi-part html periodically, until the last one, which is the actual results. No reload etc is used, as this is server push. Most browsers support it by now.
      This sounds a bit more promising, thanks. I'll be sure to explore this. One thing bothers me a bit though:

      From the CGI.pm documentation, Server Push section:

      Users interested in server push applications should also have a look at the CGI::Push module.

      Only Netscape Navigator supports server push. Internet Explorer browsers do not.

      Is this what you were referring to? I've also heard some rumblings that this is highly dependent on the server from which the script is served. Any truth to this? Does IE now work properly? The environment in which my code will be used sees a very wide audience and frequent changes in both browser and server software.
Re: CGI: Waiting for Output
by rkg (Hermit) on Sep 02, 2003 at 02:04 UTC
      Hmmm...this seems to be essentially the above-mentioned strategy, which is good but not great. It requires you to write out an html file that then has to be cleaned up later (merlyn suggests using a cron job to periodically delete the files). Is there no way to be a bit more dynamic? E.g. not have to write out static files at all, and hence avoid the hassle of making an intelligent cleaner-upper?
Re: CGI: Waiting for Output
by Zaxo (Archbishop) on Sep 02, 2003 at 02:08 UTC

    I think that a more efficient database would be your best speed boost. One of the dbm family might be enough from the way you describe the processing.

    Speeding up the slow part is often the best strategy.

    After Compline,
    Zaxo

Re: CGI: Waiting for Output
by graff (Chancellor) on Sep 02, 2003 at 02:59 UTC
    If the hundreds of megabytes of flat text are relatively static (only changing once a week, say, or not at all), then the best thing would be to figure out how to structure it into something other than a flat text file -- either a dbm file as suggested by Zaxo, or a relational database.

    If you're parsing through something like a log file that changes (grows) continuously, you could still consider creating "digests" at regular intervals that would not be flat text (i.e. searching/indexing would be relatively quick and properly focused from the user's perspective), so that a separate cron job handles the heavy lifting, and produces a derived data set that allows the browser interaction to rely on a simple/quick task.

    In any case, having each browser connection produce its own distinct summary web page shouldn't be a problem -- just apply a file name convention that makes it easy for yesterday's html files to be deleted at noon or whatever.

      This is a good strategy, but not one that gets at the core of the problem. Let's say my script isn't parsing hundreds of megabytes of flat text (that's out of my control unfortunately, or else I would put it in a db as Zaxo suggests). Maybe instead it's performing some huge computation or doing some massive en/decryption or waiting for a reply from another server somewhere. Is there a way to let the user see a message to the effect of "I'm working on it" and then see the result as soon as it's done? Or is this too dynamic? Is there no way to be free of reliance on cron or some other cleanup entity? Must I actually write out a file?
Re: CGI: Waiting for Output
by benn (Vicar) on Sep 02, 2003 at 11:08 UTC
    Another way may be to use a redirected 'holding pattern' script that checks a semaphore set by your main program. You could even show updates - number of files processed etc.

    If your main script creates a temp file that it periodically writes a status to, and deleted when finished, it could pass this filename to "holding.cgi", which can then periodically display the status, then the results when finished - something like this...

    #Main File use File::Temp; my ($fh, $filename) = tempfile( DIR => '/tmp'); my ($hfh, $results) = tempfile( DIR => '/html'); print redirect("holding.cgi?file=$filename&html=$results"); do { # process record print $fh $num_records,"\n" unless ($num_processed++ % 1000); } while($records_not_processed); # write results to $hfh # ...or just results.htm... # ...if you haven't got multiple users / datasets etc. unlink $filename; #Holding.cgi my $filename=param('file'); my $html=param('html'); if (-e $filename) { open FILE $filename; # read in last number # print some HTML... # ...that's got a META refresh in... # ...that points back to ourself close FILE; } else {print redirect($html);}
    Cheers, Ben.
      This works, but is a problem for sites that attempt to provide accessible content. A META refresh may disorient users, since the content may change during mid-viewing without any user interaction. Check out the W3C's Core Techniques for Web Content Accessibility Guidelines for more information.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://288222]
Approved by graff
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2024-04-19 19:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found