friedo has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, Monks.

I'm writing a daemon which monitors a directory to which large files are uploaded via FTP. When a new file is uploaded, the daemon should copy it to a temporary staging directory, whereupon it will be processed by mighty Perl.

But I don't want to accidentally grab a file that is in the middle of being uploaded. Is it enough to simply check for an advisory lock on new files, or are there other precautions I should be taking?

Alternatively, perhaps I'm going about it slightly backwards. Are there any Linux FTP servers which can trigger a program when a file is done uploading?

Replies are listed 'Best First'.
Re: Monitoring an FTP upload directory
by chargrill (Parson) on Nov 07, 2006 at 19:20 UTC

    Depends on your FTP server, and possibly the client. I've had experience where a large file gets uploaded as .filename.ext, and once the transfer is finished, the file becomes (rather quickly) renamed as filename.txt.

    I don't recall if this convention is inherent in FTP, or was just a happy coincidence that the server and client combination I was using happened to honor. I'd test a few large file transfers in your local FTP server/client combination of choice.

    Update: Thanks for jogging my memory, wjw. ProFTPD was the server I was using, and has a nice directive for accomplishing the rename, documentation for which resides here.

    s**lil*; $*=join'',sort split q**; s;.*;grr; &&s+(.(.)).+$2$1+; $; = qq-$_-;s,.*,ahc,;$,.=chop for split q,,,reverse;print for($,,$;,$*,$/)
      There is a good nodejs module (remote-file-watcher) for this exact use available on
Re: Monitoring an FTP upload directory
by wjw (Priest) on Nov 07, 2006 at 19:23 UTC
    Check out proftpd Here

    I used something like this to move files from a customer upload location on our site DMZ to a secure location on our internal network. I a believe that this will allow you to do what you want. My memory may be a bit rusty... but I seem to recall having a perl program which parsed the log files and noted completed transfers, moved the files, and kept track of which files had been moved. We fired off the process every 10 minutes via cron, but a daemon would have worked just as well. The daemon ran inside the firewall and used scp to reach out to the DMZ ftp server and grab the log. It then processed the log and grabbed any new files, again via scp, keeping track locally of files which had been already recieved/moved.

    Probably not exactly what you had in mind, but it worked for us at the time.

    ...the majority is always wrong, and always the last to know about it...

Re: Monitoring an FTP upload directory
by shenme (Priest) on Nov 07, 2006 at 20:51 UTC
    If you can monitor the FTP server's xferlog log file, that is simply the best way to ensure you don't process incomplete files. The FTP server only makes an entry in that log file when the file transfer is 'finished'. In addition the entry notes whether the transfer was c=complete or i=incomplete. In my hacked-together monitor program I can check and process only finished transfers, which were complete, and whose path matches one of several possible directories being acted on. And react so fast that remote users sometimes re-transfer the file two and three times wanting to be sure it really did disappear "on purpose". (sigh)

    I've not had to use the features for renaming files that wjw mentions. Though if you can't see the xferlog file that would be a very good way to do it. And, yes, we use ProFTPD also for its extreme flexibility.

      We had also had the problem of cutomers uploading more than once. The files they uploaded were very large (50-60 meg) so they would start a transfer, go away for a while and come back noting that the file was not there and disregarding the fact that their client showed a successful transfer, then start the whole thing over again.

      We addressed that problem by actually placing a text file in the upload directory which had a name like (file_name)xfer_complete.txt. The contents of the file were always the same; something like "your file has been recieved and moved to a secure place on our network" if I recall correctly.

      Most customers caught on right away, the others we would 'coach' if we noticed they needed it. We removed these files after a few hours, and our scp script was coded to ignore them.

      Saved us a lot of those Sighs! :-)Just an idea to save some bandwidth for you and your customers. :-)

      ...the majority is always wrong, and always the last to know about it...

        I think this is a great idea, thanks. I'm going to go with tailing the xferlog, too. (Why didn't I think of that before? :) ).

        I've really wanted to do this, like creating a <your_filename>.received_correctly_and_processed.txt file. Automatically removing the file after a while makes it sound even better.

        The problem we often saw was that we were _so_ fast at picking and moving the file, that the user wouldn't ever see it, even when directly following the PUT with a DIR. It is one thing to see the transition from 'there' to "not there", but it kinda freaked them out that they never saw it at all. Like it had gone into the bit bucket. And our applications are named _somewhat_ differently...

Re: Monitoring an FTP upload directory
by zentara (Archbishop) on Nov 07, 2006 at 20:09 UTC
    If you are on linux, another possibility would be to check "lsof" for the file (full pathname), and loop until the lsof listing for the file dissappears. Then copy it.

    I'm not really a human, but I play one on earth. Cogito ergo sum a bum
Re: Monitoring an FTP upload directory
by gellyfish (Monsignor) on Nov 07, 2006 at 21:28 UTC

    If you are on a Linux or similar system where the modules are supported and you have FAM configured you could try something like:

    #!/usr/bin/perl use strict; use warnings; use File::Basename; use SGI::FAM; use Linux::Fuser; use File::Spec; my $fam = SGI::FAM->new(); $fam->monitor('/home/jonathan/famtest'); my %filecache; my $newdir = '/home/jonathan/famtemp'; my $fuser = Linux::Fuser->new(); while (1) { my $event = $fam->next_event(); my $basename = basename $event->filename(); my $fullpath = File::Spec->catfile($fam->which($event),$basename); if ( $event->type() eq 'create' ) { $filecache{$basename}++; } elsif ( $event->type() eq 'change' ) { if ( exists $filecache{$basename} and ! $fuser->fuser($fullpath) +) { rename $fullpath,File::Spec->catfile($newdir,$basename) or warn "Couldn't move $fullpath - $ +!\n"; delete $filecache{$basename}; } } else { print "Pathname: ", $event->filename, " Event: ", $event->type, "\n"; } }


Re: Monitoring an FTP upload directory
by shotgunefx (Parson) on Nov 07, 2006 at 21:28 UTC
    I know that one of the FTP services on Yahoo! uses the following technique. It waits 15 minutes for no activity on the file. Not perfect by far as if the file is sent from an automated process and dies half way through, the error might not get picked up.

    "To be civilized is to deny one's nature."