Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re4: Super Find critic needed

by bbfu (Curate)
on Jun 30, 2003 at 16:51 UTC ( [id://270221]=note: print w/replies, xml ) Need Help??


in reply to Re: Re: Re: Super Find critic needed
in thread Super Find critic needed

Some of these are "better" than others, but I've yet to see one that completly eliminates the risks, though they reduce the window for failure to the point of reasonable risk.

I would (perhaps naively) think that renaming the original file (renaming should be atomic, no?) to something like "$filename.$$", then reading / munging / writing to "$filename", and only deleting "$filename.$$" when the new filehandle is closed (and thus their buffers flushed as well as Perl can make them) would completely eliminate the risk. The process could stop at any point and, at worst, you'd have a partially munged new file and the original file both existant. Assuming, of course, that you have a sufficiently paranoid filesystem.

I'm not entirely sure I'm not missing something, so please enlighten me if I am. =)

bbfu
Black flowers blossom
Fearless on my breath

Replies are listed 'Best First'.
Re: Re4: Super Find critic needed
by BrowserUk (Patriarch) on Jun 30, 2003 at 17:54 UTC

    If the process is interupted after the new file has been created, but before the old file has been deleted, regardless of whether the new file was properly written and closed, when the system is restored and the script is re-run, the program will again find a file by the original name, and rename it to "$filename.$$". If the new file was was completely written and properly flushed, then no harm done, it will just be processed as though it was the original file, no further changes will be made and your back on track.

    However, if the new file was only partially written when the interuption occured, then without adding an explicit check for the existance of a file called "filename.$$", then perl's rename function will silently blow the first backup away, over writing it with the partially incomplete version.

    This implies that perl rename is implemented as either a copy or a delete followed by rename, as the OS rename (whether the command or the underlying system call), will not allow you to rename a file if a file with the new name already exists. At least this is the case under Win32, I'm not sure of the situation with other OS's.

    It therefore falls to the programmer using Perl's rename to check for and handle the situation where the new name already exists using -e or similar. Once this check is in place, then you still need to add code to handle the situation where the backup does exist and arrange to delete the (potentially partial) new file created at the last pass and restore the backup. Perl's rename will do this ostensibly in one step, but as I just noted, in reality, at least on some systems, there are two steps involved. A delete, followed by a rename. If a second interuption occures between these two steps, then you get the situation where you have a backup with no original. If the Find::File or globing processes used to build the file list uses anything other than a fully wild match criteria, then a third pass won't even see the backup as it will only be looking for the original, which no longer exists. So whilst no data has been lost, it will require a manual intervention to restore it.

    Yes. This is a paranoid view. To arrive here we need three failures to occur at exactly inopportune moments. However, I was involved in a project where the whole issue of automating the updating files in a production environment became the subject of a protracted investigation to determine a mechanism for ensuring that there were NO risks involved. The machines in question were used by cargo division of a large international airline to control the loading of freight on their fleet of 747 cargo aircraft. Accurate information of what freight had been loaded on the aircraft is paramount as the weight of the cargo and its distribution are critical information to how much fuel is required and to the handling and take-off characteristics of the aircraft when taking off from airports at high altitudes and/or hot conditions. To complicate matters, some of the servers in question were located in tin shacks on African and Russian airfields that were little more than dust strips, and with mains systems that were subjected to frequent power cuts that often lasted longer than the UPS's could maintain.

    That was done using REXX not Perl, but most of the same problems arise. The final conclusions of the investigation was that there is no 100% reliable way to completly automate the process. It can be reduced to a margin of a very low probablity of occurance, but the only way to get to 100% is to have a manual verification as the final step of the process and only accept that the process has been completed in its entirity if that verifcation runs from begining to end without interuption.

    In most real-life situations, 99% is probably good enough:)


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


      ...A third pass won't even see the backup as it will only be looking for the original, which no longer exists. So whilst no data has been lost, it will require a manual intervention to restore it.

      Why not simply have the script check or the existance of any backups (before renaming the "original") and assume the worst (ie, even if there is an original, it must be corrupt).

      It seems to me that this would eliminate the need for manual intervention without adding any risk. After all, that's exactly what the person intervening will do anyway, is it not? Then, you could have any number of power failures, all at exactly the wrong times, and the worst that will happen is the program will completely reprocess the file each time power is restored. No?

      bbfu
      Black flowers blossom
      Fearless on my breath

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://270221]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2024-04-19 06:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found