http://qs321.pair.com?node_id=1228381

longjohnsilver has asked for the wisdom of the Perl Monks concerning the following question:

Good Day to all of you Enlightened Monks,

I've been trying to find the best solution to perform some simple yet complicate (performance-wise) tasks on a "busy" NFS file system.

Let me be clearer: I'm on a Linux Debian Box and the filesystem is mounted as follows:

/mnt/mounted_dir/ type nfs (rw,noatime,rsize=32768,wsize=32768,hard,intr,tcp,nfsvers=3,addr=x.x.x.x)

By "busy" i mean that at "random" times each day a lot of (maybe 15 x second) small .xml files (~2k each) get written on this filesystem.

The things i need to perform on these files are the following:

1. Read/Fetch each file as soon as it has been written completely.
2. SFTP-Put the file on a remote FTP server.
3. Make sure the transfer has been completed successfully
4. Move the local copy of the "transferred" file to another local directory.

These tasks don't look really complicated at first sight but, i have to say, that when the filesystem finds itself in this "busy" state even executing a simple "ls" takes ages, really ages. The script only works smoothly when the filesystem is not in a "busy" state.

Here's the core snippet of my code for anyone interested to helping me out in making things perform better. Thanks.

#!/usr/bin/perl -w use strict; use diagnostics; use autodie; use Net::SFTP::Foreign; use File::Copy; use File::stat; -->> omitted code to make things more readable $sftp = Net::SFTP::Foreign->new($host, %args); $sftp->setcwd($remote_dir) || die log_msg($sftp->error."Exiting...\n") +; opendir($DH, $local_dir) or die $!; while (defined(my $file = readdir($DH)) { my $mtime = stat("$local_dir/$file")->mtime; my $age =(time - $mtime); chomp($age); next unless ($age > 2); next unless (-f "$local_dir/$file"); next unless ($file =~ m/\.xml$/); # sftp put section if ($sftp->put("$local_dir/$file")) { move("$local_dir/$file", "$local_dir_mv") or die log_msg("Th +e move operation failed: $!"); } else { die log_msg($sftp->error); } } closedir($DH);

Replies are listed 'Best First'.
Re: Read, SFTP Put and Move Files from a "Busy" NFS FileSystem
by Corion (Patriarch) on Jan 11, 2019 at 09:38 UTC

    If the filesystem is really that busy, you want to avoid doing any kind of file system call as much as possible. So it makes sense to move all the file system calls as much back in your loop as possible.

    In your loop you do the following file system calls:

    1. my $mtime = stat("$local_dir/$file")->mtime;
    2. my $age =(time - $mtime); chomp($age); # ??? why comp?! next unless ($age > 2);
    3. next unless (-f "$local_dir/$file"); # File was moved in the m +eantime already?!

    You throw the result of these two file system calls away if the filename does not end on .xml. So a first step would be to only do the file system calls if the file name ends with .xml:

    while (defined(my $file = readdir($DH)) { next unless ($file =~ m/\.xml$/);

    You also skip all files that are younger than 2 seconds, most likely under the assumption that they haven't been written yet. You could remember these filenames and just put them in a queue and check them again in 2 seconds time, instead of waiting for a second readdir round to produce them again. This would save you another call to opendir+readdir (which takes ages, as you say).

    You also check if the file still exists before moving it - is your program running in multiple instances? Otherwise, you could remove that check as well.

      Hello longjohnsilver,

      Just to add something minor here but I think useful.

      Since as you said the node is LinuxOS I would also check if the file is processed (opened) before moving it. It might not have finished being written before you move it.

      print "file $file is opened\n" if `lsof $file`;

      Hope this helps, BR.

      Seeking for Perl wisdom...on the process of learning...not there...yet!

        Actually, under Linux (and other Unix/POSIX derived systems), you can move a file while another process has it open. This is because entries in directories just point to inodes (unlike MS Windows and some other systems where the directory entry and inode are the same entity).

        Caveat: If the program writing the file tries to reference the file by pathname, it won't find it. For example, if the program uses a temporary name scheme that reads the directory and looks for the "highest" lexicographical name and then "increments" that, that scheme will produce duplicate names.

        However, since we're talking about NFS, the file may be opened (written) from a different computer. In this case, lsof would work at most on the server, but not from another client.
        print "file $file is opened\n" if `lsof $file`;

        ... and let's hope nobody sets $file='foo ; rm -rf /'; (Shell injection). To avoid this problem, see "Safe pipe open" in perlipc and Ssh and qx.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        Thanks for the insight thanos1983!
      Thanks Corion, code is a little faster now.
Re: Read, SFTP Put and Move Files from a "Busy" NFS FileSystem
by salva (Canon) on Jan 12, 2019 at 15:14 UTC
    If you have control of the program writing the XML files, the best approach is to make it write each one to a temporary file location and once all the data has been written, move it to its final destination location. In that way you are never going to find an incomplete .xml on the filesystem.

    Another thing you can do is to check the file contents. Incomplete XML files are easy to detect, you can use an XML parser or just read the first XML tag in the file and then check that the last bytes of the file are the corresponding closing tag.

Re: Read, SFTP Put and Move Files from a "Busy" NFS FileSystem
by karlgoethebier (Abbot) on Jan 12, 2019 at 19:06 UTC

    See also Re: Monitor new file

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help