Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: how to tell if a file is still being modified

by Roger (Parson)
on Sep 15, 2003 at 23:34 UTC ( [id://291683]=note: print w/replies, xml ) Need Help??


in reply to how to tell if a file is still being modified

I rememberred that we had this problem with our automatic job processing too. We will kick off processing when certain file has arrived from FTP. We came up with several solutions:

1) By periodically checking the size of the file coming in, and if the file size has stopped growing, then we would assume that the file transfer has stopped. However this does *NOT* work, at least not reliablly! We had one perticular case where the FTP has paused / died and the system thought that the file has been received properly, and started to process the file. It made a total mess that took many days to resolve.

2) A better approach than the first one is to modify the system to receive the data file first, followed by a trigger file. The system will act on the arrival of the trigger file. This approach is more reliable than the first one, however, it assumes that the sender works properly. We had a case when the sender program/script sent half the data file, and then somehow sent the trigger file without checking the completion of the data file. This of cause caused another mess.

3) The best senario is when the data file has an integrated verification mechanism, like a ZIP file. You can be certain if the incoming ZIP file has arrived completely by periodically testing the integrity of the ZIP file with the zip -T switch. This method works 100% of the time.

The 3rd method, with a self validating file format, is the preferred method, if the client/sender can produce such format. If not possible, then fall back to the 2nd method with additional trigger file. If this is not possible, then fall back to the 1st method and pray. :-D
  • Comment on Re: how to tell if a file is still being modified

Replies are listed 'Best First'.
Re: Re: how to tell if a file is still being modified
by nimdokk (Vicar) on Sep 16, 2003 at 10:10 UTC
    I have considered option 3, but it feels a bit too kludgy and there's always the possibility that different versions of zip behave differently since there is no standard for creating zip archives. I'm still keeping it in the back of my mind though since as you pointed out, option 2 can cause problems when the "trigger" file is loaded without verifying that the original file arrived correctly, or I've also seen cases where someone will load a trigger file without sending the data file (or the other way around). I'd say that it works 80-90% of the time which is good. Its just that 10-20% when it does not that is annoying, especially when you get paged at 3am because some idjit mistakenly created a data file without a trigger, or vice versa. The best solution would perhaps be a combination of some or all of these options (provided a workable solution could be created easily) :-)


    "Ex libris un peut de tout"
      I agree with you on the diversity of versions of ZIP out there. I'd say most of the differences would be in its encryption algorithm. (I am not in the US, so I am using the export version of the strong encryption algorithm to compile my ZIP program, hmmm, perhaps that is why I still haven't had any problems yet.) Provided there is no encryption requirement, ZIP is still a good solution though. And of cause if there was any problem, it would show up in the testing phase, wouldn't it?
        Personally, I haven't explored all the options with ZIP files. However, we don't always have folks uploading ZIPs. What I was initially looking for was some sort of EOF marker that I could count on. But I think someone else mentioned the MD5 hash which is something I thought about, but have not yet explored. Unfortunately, since the issue has come up, we have had bigger problems to worry about.


        "Ex libris un peut de tout"
Re: Re: how to tell if a file is still being modified
by tagg (Acolyte) on Sep 16, 2003 at 20:41 UTC
    You might even enhance the functionality of the 'trigger' file, by including the MD5 sum of the transferred file...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://291683]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (10)
As of 2024-04-23 08:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found