Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Read, SFTP Put and Move Files from a "Busy" NFS FileSystem

by Corion (Patriarch)
on Jan 11, 2019 at 09:38 UTC ( [id://1228383]=note: print w/replies, xml ) Need Help??


in reply to Read, SFTP Put and Move Files from a "Busy" NFS FileSystem

If the filesystem is really that busy, you want to avoid doing any kind of file system call as much as possible. So it makes sense to move all the file system calls as much back in your loop as possible.

In your loop you do the following file system calls:

  1. my $mtime = stat("$local_dir/$file")->mtime;
  2. my $age =(time - $mtime); chomp($age); # ??? why comp?! next unless ($age > 2);
  3. next unless (-f "$local_dir/$file"); # File was moved in the m +eantime already?!

You throw the result of these two file system calls away if the filename does not end on .xml. So a first step would be to only do the file system calls if the file name ends with .xml:

while (defined(my $file = readdir($DH)) { next unless ($file =~ m/\.xml$/);

You also skip all files that are younger than 2 seconds, most likely under the assumption that they haven't been written yet. You could remember these filenames and just put them in a queue and check them again in 2 seconds time, instead of waiting for a second readdir round to produce them again. This would save you another call to opendir+readdir (which takes ages, as you say).

You also check if the file still exists before moving it - is your program running in multiple instances? Otherwise, you could remove that check as well.

Replies are listed 'Best First'.
Re^2: Read, SFTP Put and Move Files from a "Busy" NFS FileSystem
by thanos1983 (Parson) on Jan 11, 2019 at 10:08 UTC

    Hello longjohnsilver,

    Just to add something minor here but I think useful.

    Since as you said the node is LinuxOS I would also check if the file is processed (opened) before moving it. It might not have finished being written before you move it.

    print "file $file is opened\n" if `lsof $file`;

    Hope this helps, BR.

    Seeking for Perl wisdom...on the process of learning...not there...yet!

      Actually, under Linux (and other Unix/POSIX derived systems), you can move a file while another process has it open. This is because entries in directories just point to inodes (unlike MS Windows and some other systems where the directory entry and inode are the same entity).

      Caveat: If the program writing the file tries to reference the file by pathname, it won't find it. For example, if the program uses a temporary name scheme that reads the directory and looks for the "highest" lexicographical name and then "increments" that, that scheme will produce duplicate names.

      However, since we're talking about NFS, the file may be opened (written) from a different computer. In this case, lsof would work at most on the server, but not from another client.

        I suppose a poor man's subsitute would be to just stat all the *.xml files, sleep for 10 (30?) seconds, stat them all again, and skip any where the filesize has changed. Not perfect, but probably works most of the time. Probably more reliable than just processing any file more than 2 seconds old.

        Another thought...use the Linux inotify() api so that you can be more event oriented. See Linux::Inotify2. Then, every time a new file is created in the directory, you get an event. That would keep you from having to scan the directory. Though, it would increase the risk of processing a "in-use" file, so you'd need pretty good logic to determine when the file is done being written.

      print "file $file is opened\n" if `lsof $file`;

      ... and let's hope nobody sets $file='foo ; rm -rf /'; (Shell injection). To avoid this problem, see "Safe pipe open" in perlipc and Ssh and qx.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      Thanks for the insight thanos1983!
Re^2: Read, SFTP Put and Move Files from a "Busy" NFS FileSystem
by longjohnsilver (Acolyte) on Jan 12, 2019 at 15:07 UTC
    Thanks Corion, code is a little faster now.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1228383]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (1)
As of 2024-04-25 03:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found