Re: Update on controlling long-running processes via CGI

in reply to Update on controlling long-running processes via CGI

I've done something similar. This runs on a RHEL server with apache, etc. I used CGI::Application and HTML::Template, so you may have to make a few slight changes, but this should (hopefully) help you get things working.

In my case, I'm displaying a large spreadsheet-like table of data. I only show 50 rows on a page, but there's a pager control at the bottom (first, prev, next, last). There's also a button users can click labeled "Download CSV" that will allow them to download all the data as a comma-separated-value file. If there's a LOT of data, dumping the file can take a relatively long time (the web page might time out and generate an error), or worse, the user might get impatient and click the button two or three more times!

So, here's how I did it:

First, the initial page with the data table has this HTML near the bottom:

<form name="csvform" action="/myapp.pl/downloadcsv" method="POST">
<input type="submit" name="csv" id="csv" value="Generate CSV File" onc
+lick="return SubmitTheCSVForm();" />
</form>
[download]

The main thing to notice here is that, when clicked, we're going to call a script called "downloadcsv" (in CGI::Application, every page is defined in a "run mode", and run modes are just subroutines. All my run modes are in the myapp.pl script.)

The onclick event points to some javascript that disables the button, preventing multiple clicks. It looks like this:

var submitted = false;
function SubmitTheCSVForm() {
  if(submitted == true) { return; }
  document.csvform.csv.value = 'working...';
  document.csvform.csv.disabled = true;
  submitted = true;
  document.csvform.submit();
}
[download]

Inside downloadcsv, I have the following code:

sub downloadcsv : Runmode {
  my $self = shift;

  if (my $pid = fork) {
    # parent does this
    return $self->redirect("/myapp.pl/csv_status");
  } elsif (defined $pid) {
    # child does this
    close STDOUT;
    close STDERR;
    open STDERR, ">&=1";

    my $id = $self->session->id();
    my $cmd = "$CFG{'PATH'}/make_csv.pl";
    exec "$cmd", "$id";
    die "can't do exec: $!";

  } else {
    die "cannot fork: $!";
  }

}
[download]

Notice that I use fork here. The parent process redirects to another page, which will basically display a "please wait..." message (more on that later). The child process actually runs another script (the long running process that actually does the work - in my case, generating the file to be downloaded). Things to note: I have a config file in which I define a path to where my script lives. My $cmd variable contains the command I would type on a linux command line (it's not a URL). You have to make sure your permissions are set correctly. For example, if the web server runs as user 'nobody', then this script is run as user 'nobody'. Since it's writing a file, the location it's written to must be writable by user 'nobody', etc. Make sure you test your command as the correct user (if you only test as yourself, you may have different env variables, path settings, etc. In my case, I'm running another perl script, but $cmd could contain anything. This is a security risk - be careful, especially if you build the command using any input from the user. I pass a session ID, in case multiple users request different downloads at the same time. I'm skipping some of those details in order to try to stay on topic.

Also note that I close STDOUT and STDERR. If you don't do this, apache won't "let go" of the child process. This is very important! You must sever this connection for the child to be independent. Also, if exec works correctly, it will never return, so the die on the next line will never be reached.

Meanwhile, back in the parent process, we redirected to the "cvs_status" page, which is defined something like this:

sub csv_status : Runmode {
  my $self = shift;

  my $id = $self->session->id();
  my $path = $CFG{'CSV_TEMP'};

  my $still_running = 0;
  if ( -e "$path/$id/csv.pid" ) {

    open my $in, '<', "$path/$user/csv.pid" or die
      "can't access $user/csv.pid file : $!";
    my $pid = <$in>;
    close $in;

    if ( IsStillRunning($pid) ) {
      $still_running = 1;
    } else {
      $still_running = 0;
    }
  }

  my $template = $self->load_tmpl('csv_status.html');
  $template->param(
    TITLE  => "CSV Status",
    STILL_RUNNING  => $still_running,,
  );
  return $template->output;
}
[download]

I've removed a lot of error checking to make things simpler. The basic idea is that my long running script creates a process id file when it starts up. I can use that PID to check and see if it's still running or not. I pass this status to my template with the $still_running variable. Basically, there are two versions of the "status" page, depending on whether or not the process is still running, or has finished. The template (cvs_status.html) contains the following:


<TMPL_IF STILL_RUNNING>
<head>
<meta http-equiv=refresh content=5>
</head>

...

<TMPL_IF STILL_RUNNING>
<img src="images/working.gif" />
<hr>
<p>Please be patient... this might take a while.</p>
<TMPL_ELSE>
<h3> Job complete!</h3>
</TMPL_IF>
[download]

Again, I'm only showing the important bits. At the top, inside the header, IF the job is still running, I use a meta tag to force the page to reload every 5 seconds. Further down, in the body of the page, IF the job is still running, I display an animated gif (a little spinning icon), and a "please wait" message. When the job completes, the meta tag is NOT written (so the page refresh stops), and the icon/"please wait" message gets replaced with a "job complete" message (in my case, I also generate a link to the CSV file that the user can click to download.)

It would probably be better to use AJAX to refresh the page, instead of the meta tag, but I did this a long time ago before I knew how to use AJAX.

Good luck, I hope this helps!

In Section Seekers of Perl Wisdom