Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

Update on controlling long-running processes via CGI

by dannyhmg (Novice)
on Nov 12, 2014 at 19:45 UTC ( #1107016=perlquestion: print w/replies, xml ) Need Help??

dannyhmg has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I've got a simple CGI web form that gathers some input from the user, runs a perl script that takes 2-3 minutes to complete, and then displays the results. During the 2-3 minutes that the program is running, I'd like to display an "in-progress" page that will periodically self-refresh and give the user messages about its status. This seems like it should be straightforward, but I can't seem to figure out how to do it.

I realize this is similar to the issue addressed here: Managing a long running server side process using CGI, but my code is running on Apache 2.2.15 (Red Hat), not on windows. I'm not sure how to properly initiate a child process on this server. I tried using  fork(), but it doesn't seem to work (as far as I can tell, the child process never gets executed). Google pointed me to some rather vague references to a module called Apache2::SubProcess, but it's not clear to me (even after reading the CPAN documentation for that module) how this is supposed to work.

Thanks for any suggestions!

Replies are listed 'Best First'.
Re: Update on controlling long-running processes via CGI
by GrandFather (Saint) on Nov 12, 2014 at 20:22 UTC
Re: Update on controlling long-running processes via CGI
by scorpio17 (Canon) on Nov 13, 2014 at 17:07 UTC

    I've done something similar. This runs on a RHEL server with apache, etc. I used CGI::Application and HTML::Template, so you may have to make a few slight changes, but this should (hopefully) help you get things working.

    In my case, I'm displaying a large spreadsheet-like table of data. I only show 50 rows on a page, but there's a pager control at the bottom (first, prev, next, last). There's also a button users can click labeled "Download CSV" that will allow them to download all the data as a comma-separated-value file. If there's a LOT of data, dumping the file can take a relatively long time (the web page might time out and generate an error), or worse, the user might get impatient and click the button two or three more times!

    So, here's how I did it:

    First, the initial page with the data table has this HTML near the bottom:

    <form name="csvform" action="/" method="POST"> <input type="submit" name="csv" id="csv" value="Generate CSV File" onc +lick="return SubmitTheCSVForm();" /> </form>

    The main thing to notice here is that, when clicked, we're going to call a script called "downloadcsv" (in CGI::Application, every page is defined in a "run mode", and run modes are just subroutines. All my run modes are in the script.)

    The onclick event points to some javascript that disables the button, preventing multiple clicks. It looks like this:

    var submitted = false; function SubmitTheCSVForm() { if(submitted == true) { return; } document.csvform.csv.value = 'working...'; document.csvform.csv.disabled = true; submitted = true; document.csvform.submit(); }

    Inside downloadcsv, I have the following code:

    sub downloadcsv : Runmode { my $self = shift; if (my $pid = fork) { # parent does this return $self->redirect("/"); } elsif (defined $pid) { # child does this close STDOUT; close STDERR; open STDERR, ">&=1"; my $id = $self->session->id(); my $cmd = "$CFG{'PATH'}/"; exec "$cmd", "$id"; die "can't do exec: $!"; } else { die "cannot fork: $!"; } }

    Notice that I use fork here. The parent process redirects to another page, which will basically display a "please wait..." message (more on that later). The child process actually runs another script (the long running process that actually does the work - in my case, generating the file to be downloaded). Things to note: I have a config file in which I define a path to where my script lives. My $cmd variable contains the command I would type on a linux command line (it's not a URL). You have to make sure your permissions are set correctly. For example, if the web server runs as user 'nobody', then this script is run as user 'nobody'. Since it's writing a file, the location it's written to must be writable by user 'nobody', etc. Make sure you test your command as the correct user (if you only test as yourself, you may have different env variables, path settings, etc. In my case, I'm running another perl script, but $cmd could contain anything. This is a security risk - be careful, especially if you build the command using any input from the user. I pass a session ID, in case multiple users request different downloads at the same time. I'm skipping some of those details in order to try to stay on topic.

    Also note that I close STDOUT and STDERR. If you don't do this, apache won't "let go" of the child process. This is very important! You must sever this connection for the child to be independent. Also, if exec works correctly, it will never return, so the die on the next line will never be reached.

    Meanwhile, back in the parent process, we redirected to the "cvs_status" page, which is defined something like this:

    sub csv_status : Runmode { my $self = shift; my $id = $self->session->id(); my $path = $CFG{'CSV_TEMP'}; my $still_running = 0; if ( -e "$path/$id/" ) { open my $in, '<', "$path/$user/" or die "can't access $user/ file : $!"; my $pid = <$in>; close $in; if ( IsStillRunning($pid) ) { $still_running = 1; } else { $still_running = 0; } } my $template = $self->load_tmpl('csv_status.html'); $template->param( TITLE => "CSV Status", STILL_RUNNING => $still_running,, ); return $template->output; }

    I've removed a lot of error checking to make things simpler. The basic idea is that my long running script creates a process id file when it starts up. I can use that PID to check and see if it's still running or not. I pass this status to my template with the $still_running variable. Basically, there are two versions of the "status" page, depending on whether or not the process is still running, or has finished. The template (cvs_status.html) contains the following:

    <TMPL_IF STILL_RUNNING> <head> <meta http-equiv=refresh content=5> </head> ... <TMPL_IF STILL_RUNNING> <img src="images/working.gif" /> <hr> <p>Please be patient... this might take a while.</p> <TMPL_ELSE> <h3> Job complete!</h3> </TMPL_IF>

    Again, I'm only showing the important bits. At the top, inside the header, IF the job is still running, I use a meta tag to force the page to reload every 5 seconds. Further down, in the body of the page, IF the job is still running, I display an animated gif (a little spinning icon), and a "please wait" message. When the job completes, the meta tag is NOT written (so the page refresh stops), and the icon/"please wait" message gets replaced with a "job complete" message (in my case, I also generate a link to the CSV file that the user can click to download.)

    It would probably be better to use AJAX to refresh the page, instead of the meta tag, but I did this a long time ago before I knew how to use AJAX.

    Good luck, I hope this helps!

Re: Update on controlling long-running processes via CGI
by jhourcle (Prior) on Nov 13, 2014 at 16:51 UTC

    If you can build a way for the process to report on its status (such that another script can then monitor it), you can likely convert the whole thing to use what's called 'server push'.

    There's a few different variations, but the methods that I've used are the multipart/x-mixed-replace trick, where you send multiple HTML documents with status updates and then the final one when done. I've also used more 'web-app' type systems, where the page is set up, but populated/updated with javacript after the initial draw. I find the first one easier, but not all browsers (eg, IE) support it.

    In any case, you need to make sure that your server is treating your CGIs as 'NPH' (non-parsed-headers ... ie, it won't wait for all of the content to come down before it emits it to the client).

Re: Update on controlling long-running processes via CGI
by Anonymous Monk on Nov 13, 2014 at 04:35 UTC

    I'm not sure how to properly initiate a child process on this server

    Proc::Background has a nice interface for that

Re: Update on controlling long-running processes via CGI
by jellisii2 (Hermit) on Nov 13, 2014 at 17:16 UTC
    This ideally should be handled in AJAX using promises, assuming the end user is using something that resembles a modern browser. I link jQuery's stuff here because that's what I use, but if you want to roll your own, I don't see what's stopping you.

Re: Update on controlling long-running processes via CGI
by brachtmax (Initiate) on Nov 13, 2014 at 10:44 UTC
    regarding the creation of a a child process,- fork() for itself creates just a clone of the current process, it's useless without a subsequent exec(). So after the fork() you have two (almost) identical processes, both of'em just returning from fork(). The fork return value is then used to determine whether it's child or parent - in case of parent you do the exec() then which replaces the clone with whatever process you like: if($pid = fork()) { #...this is the parent ... } elsif (defined $pid) { # this is the child... exec(<whatever executable>) } else { # error - the fork() didn't work }
Re: Update on controlling long-running processes via CGI
by sundialsvc4 (Abbot) on Nov 13, 2014 at 03:45 UTC

    Broadly speaking, this sort of activity needs to be treated as “a background job.”   The web-page ... Apache-based or otherwise ... therefore should be seen merely as a user interface by which the user can submit work to be processed, query the present status of the work in progress, and retrieve the output.   There are many “batch job monitoring systems” out there for all operating systems, including those that are designed to be cross-platform.   There are CPAN modules, as well.

    An elementary implementation of this idea ... which, by the way, is probably the most common one ... is to have one-or-more workers that are launched by means of cron, with an SQL database acting as the job-queue.   The workers query the database to find work-to-do and, using an SQL transaction to provide atomicity, select one.   Then they carry out the work, ensuring that any exceptions that may be thrown will be caught.   And this they do forever.

    A key aspect of this arrangement is that, no matter how many units of work may be requested, and no matter how rapidly they come in, the work is always carried out in a predictable and controlled way.   The web page is, and the web page remains, only the user-interface:   the means by which the user can interact with the batch system, but not a player in the game.

Re: Update on controlling long-running processes via CGI
by Anonymous Monk on Mar 18, 2017 at 01:16 UTC

    Here is how I did it...I created another (unique) HTML file with meta tags which will cause the page to refresh itself every 15 seconds and with no cache saved. Once that placeholder HTML file is created, redirect the user to that placeholder HTML file and kick off a grandchild worker to update the HTML file dynamically. Once the grandchild is done updating the HTML file then we clear out the HTML one final time with just the final HTML results you want shown.

    CGI - Parent worker: /srv/www/cgi-bin/

    • obtain CGI params like normal
    • create directory and html file and give proper permissions for both
    • printf HTMLFILE "<meta http-equiv=\"refresh\" content=\"15\">"
    • printf HTMLFILE "<META HTTP-EQUIV=\"Pragma\" CONTENT=\"no-cache\">"
    • printf HTMLFILE "please wait message or an animated gif"
    • print "Location: PlaceHolderURL\n\n"
    • fork off a child worker (passing in the html filename)
    • wait 3 seconds to allow the child to start up
    • exit

    CGI - Child worker:

    • close stdout (close STDOUT;) and stderr (close STDERR;) so browser doesn't wait on child
    • exec to a grandchild which will do the work, passing in the html filename

    CGI - Grandchild worker: /srv/www/cgi-bin/

    • get html file as an argument
    • append html file with any progress you want to show and do long-running processing
    • once finished processing:
      • -optional: close and reopen html file without appending to clear it
      • -print out what you want shown (the html results)

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1107016]
Approved by GotToBTru
Front-paged by Old_Gray_Bear
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2023-02-02 08:05 GMT
Find Nodes?
    Voting Booth?
    I prefer not to run the latest version of Perl because:

    Results (16 votes). Check out past polls.