Maestro_007 has asked for the wisdom of the Perl Monks concerning the following question:
I'm trying to implement a "recover" option into a script that may fail at any of a few given points. Currently we need to rerun the whole thing in the event of a failure, but it's a big data warehousing project, and as the size of the data files grows, the importance of recovery will become more and more important.
I prototyped a system that seems to be pretty reliable, but it makes use of the dreaded "goto". I think that's okay, that it is in a very controlled circumstance and won't wreak havoc, but I wanted to pose this to the group at large: what is the best method to insert "checkpoints", that leave behind the ability to resume a task at the point where it failed?
I've attached two scripts: the first is the main script that performs three tasks. Any time one of these tasks fails, a file is left behind, ensuring that the process can be picked up where it left off. The other is a "wrapper" script, which calls either the recovery file or the whole process from the beginning.
NOTE: There are a few housekeeping things that this obviously doesn't do, e.g. checking for a valid code label, etc. I left them out intending to do that when it comes time for real implementation.
The main script, recover.pl
#!/usr/local/bin/perl
use Getopt::Std;
our ($opt_r);
getopts('r:');
my $recover = uc $opt_r;
$recover and goto ($recover);
my $rec_file_name = 'start.rec';
FIRST: first();
SECOND: second();
THIRD: third();
CleanUp();
# do the first task
sub first
{
# some other stuff may happen here,
# so we'll go ahead and write the recovery info
write_recovery_file('first');
print STDERR "This is first\n";
sleep(1);
# die "died in first";
}
sub second
{
write_recovery_file('second');
print STDERR "This is second\n";
sleep(1);
# die "died in second";
}
sub third
{
write_recovery_file('third');
print STDERR "This is third\n";
sleep(1);
# die "died in third";
}
sub CleanUp
{
`rm start.rec`;
print "Recovery file deleted\n" unless ($?);
}
sub write_recovery_file
{
my $str = shift;
open RECOVER, ">$rec_file_name";
print RECOVER "$0 -r$str\n";
close RECOVER;
}
And here's the wrapper script:
#!/usr/local/bin/perl
# This tests the recovery system
$recovery_file = check_for_recovery_file();
# execute the script at either the recovery step or the beginning
$recovery_file ? recover($recovery_file) : recover();
sub recover
{
my $file_name = shift;
my $cmd_line = 'recover.pl';
if ($file_name)
{
open INFILE, "$file_name";
# assumes the recovery file contains
# no more than one line of text
chomp($cmd_line = <INFILE>);
print STDERR "Resuming failed process: '$cmd_line'";
}
`$cmd_line`;
die "Could not execute $file_name: $!" if ($?);
print STDERR "'$cmd_line' successful\n";
}
sub check_for_recovery_file
{
# won't hard-code this in real life
$_ = 'start.rec';
(-s) ? return $_ : return 0
}
Any thoughts?
MM
Re: Best method for failure recovery?
by dragonchild (Archbishop) on Sep 19, 2001 at 22:25 UTC
|
Instead of gotos, you could just use an array of subrefs.
my @dispatch = (
\&first,
\&second,
\&third,
);
my $start = 0;
$start = $recover if $recover > $start;
foreach my $index ($start .. $#dispatch) {
&{$dispatch[$index]};
}
------ We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement. | [reply] [d/l] |
|
That method had occurred to me, but in a way I think the problem would be similar from a maintenance perspective.
On the one hand, it doesn't use a goto, but on the other hand, it uses symbolic (Update: Ack! they're not symbolic, I just didn't read it right! thanks!!)coderefs. If the future maintainer isn't a perl guy (and in some cases, even if he is), he'll have a much better chance of cursing my name but still getting through it if there's something as hated and feared, but at least known as a goto, than if I stick him with something where the only clue is no strict 'refs' and some good comments.
Still, for validation of the parameter, your solution is much easier and more reliable. There is a "guaranteed" set of steps in a "guaranteed" order, and you can't just jump to any old arbitrary place in the script. From that point of view, I may go with it instead.
thanks!
MM
| [reply] [d/l] [select] |
|
Actually, the list of coderefs does not use symbolic references. Using symbolic references would be something like&{"$recovery};. What I do is use hard references instead.
And, as always, you should comment any use of an advanced feature, such as coderefs, if you expect your code to be maintained by people who know less than you do. This is for every language, not just Perl.
------ We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.
| [reply] [d/l] |
Re: Best method for failure recovery?
by derby (Abbot) on Sep 19, 2001 at 22:49 UTC
|
$step_1 = 0;
$step_2 = 0;
$step_3 = 0;
$working = 1;
while( $working ) {
eval {
$step_1 || first();
$step_2 || second();
$step_3 || third();
clean_up();
};
if( $@ ) {
print STDERR $@;
} else {
$working = 0;
}
}
update Err, make that $step_1 ||= first() ...
-derby | [reply] [d/l] |
Re (tilly) 1: Best method for failure recovery?
by tilly (Archbishop) on Sep 20, 2001 at 16:50 UTC
|
Well I would suggest working it like this.
Divide the large job into a series of more managable tasks
which have dependencies between them. Arrange to set up
each task as an item that can be restarted at any point
within the task without having damanged the ability of the
task to go forward. (Basically this means writing each
task such that it doesn't wipe out its initial data, and can
clean up or overwrite the previous partial run.) Then set
up a control table with the open tasks. In that control
table you mark tasks that need to run, mark them as being
run, run them, then mark them as done.
Now your script can be re-run as many times as you want, and
will skip work that was already done. In fact you can even
have your script do as much work as feasible on each run,
skipping any trouble spots, so that after a human sees it
the bulk of the work got done despite any issues. Plus as
a bonus if you do this carefully you may get out of it the
ability to run your script simultaneously on several
machines.
I attest from personal experience that while writing
everything in this fashion can be a lot of work, some small
steps towards the control table and distinct transactions
idea does a lot towards simplifying your overall
program and making it capable of handling all sorts of
complex failure modes robustly. (Something doesn't look
right? Abort, send notification, then continue with other
stuff it can do!)
I can also attest from personal experience that the various
goto solutions offered remind me of some really bad
systems that I have worked with. Sure, if you do everything
just right, it might work. But it is inherently a fragile
approach and leads to fragile code. Not what I want in a
production system! (And no, I have not merely heard vague
rumor that goto is a bad idea. Give me credit for
having done more homework on the topic than that.) | [reply] |
Re: Best method for failure recovery?
by tommyw (Hermit) on Sep 20, 2001 at 12:50 UTC
|
my $recover = $opt_r;
first() unless $recover>1;
second() unless $recover>2;
etc. And then simply writing a numeric value to the recovery file.
I'd hope this was readable by anybody who knows English, without needing to know perl at all.
Incidentally, with your setup, I think that you're not going to set $rec_file_name, if your recovery flag is set, and since you've declared it with my, it's not going to be visible within the subroutines anyway.
Of course, you'd find this when writing production code: remember -w and strict are your friends :-) | [reply] [d/l] |
Re: Best method for failure recovery?
by MZSanford (Curate) on Sep 20, 2001 at 13:12 UTC
|
This is a problem i have faced many times, and ended up writting some really odd code to reuse. But, most of the time, i found goto to be a fine solution. There will be people who will down vote this because the "dislike" goto, though i would guess they have never used it, and have only read that it is bad.
But, goto does require a moments thought. What i have done is something like the following (untested):
use Getopt::Long;
my $RECOV_STEP = 'FTP_FILE';
my $optre = &GetOptions("-restart=s" => \$RECOV_STEP);
if ($optre == 0) {
print "Invalid Option Processing\n";
&usage();
}
eval {
goto $RECOV_STEP;
};
if ($@) { die "Invalid Recovery Step '$RECOV_STEP'\n" };
FTP_FILE: {
### get data
print "FTP Step Started\n";
};
LOAD_FILE: {
### load database with file
print "Load Step Started\n";
};
CLEANUP: {
### archive data
print "Cleanup Step Started\n";
};
sub usage {
print "$0 -restart <STEP>\n";
}
I have worked with production operation groups for some time now, and have found that they seem to prefer named restart "steps" as opposed to numbers. <minirant>While this may be diffrent where you are, unless you want to be the person up as night on the phone telling them the numbers, i suggest named steps and a good document.</minirant>
my own worst enemy
-- MZSanford
| [reply] [d/l] [select] |
|
|