Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^2: Portable way to determine if two names refer to the same file?

by soonix (Canon)
on Aug 16, 2019 at 08:48 UTC ( #11104558=note: print w/replies, xml ) Need Help??


in reply to Re: Portable way to determine if two names refer to the same file?
in thread Portable way to determine if two names refer to the same file?

Ok win doesn't support i node
It doesn't, but for the prevailing NTFS, there's a FileID, which seems to be reliable, see the discussion in Anyway to determine path being monitored with Win32::ChangeNotify?.
On the other side, I'd assume Linux' stat's inode to be of no value for FAT file systems…
  • Comment on Re^2: Portable way to determine if two names refer to the same file?

Replies are listed 'Best First'.
Re^3: Portable way to determine if two names refer to the same file?
by jcb (Priest) on Aug 17, 2019 at 05:55 UTC
    for the prevailing NTFS, there's a FileID, which seems to be reliable

    In theory, NTFS Master File Table ($MFT) record numbers are NTFS inode numbers, although I suspect that defragmentation tools might be able to sort the MFT, which would make them unstable, but still usable for comparing two files.

    Of course, Microsoft being Microsoft, a bit of research quickly uncovered at least two different API calls for handling this, one of which is new!shiny! in Windows Server 2012 — and apparently is only found in Windows Server and may or may not actually work on all files or may only work on files in ReFS volumes, whatever the hell those are.

    While the 64-bit FileID is not guaranteed to be stable on FAT, FAT does not support links of any type, so simply comparing absolute filenames will work.

    Microsoft claims that a VSN:FileID tuple uniquely identifies a file. GNU claims that a st_dev:st_ino tuple uniquely identifies a file.

    On POSIX systems, device numbers are guaranteed to uniquely identify mounted filesystems, since a device number is the "access path identifier" for a mounted filesystem, but are not guaranteed to remain stable over time. On Windows, the analogous value seems to be the "volume serial number", which is stable across time because it is in the volume header, but its uniqueness is simply assumed and it has no role whatsoever in actually mapping I/O to the underlying storage. I wonder what happens if a Windows box is presented with two disks with the same volume serial number and different contents?

    Back to the point, how to get that VSN:FileID tuple in Perl?

    On the other side, I'd assume Linux' stat's inode to be of no value for FAT file systems…

    Oddly enough, if I understand the kernel sources correctly, the inode number has no meaning in terms of the actual filesystem, but is consistent with the rule that only the same file has the same inode number. This is a trick the kernel plays by keeping track of every inode that anyone is "looking at" and ensuring that each file in the dcache from a FAT filesystem has a unique inode number within that filesystem (or possibly system-wide: I am not entirely certain whether that table is per-filesystem or global). Since any way of examining a file in Linux creates a dcache entry that persists until either the filesystem is unmounted or the kernel recycles the memory, the kernel is able to maintain the illusion that FAT files have stable inode numbers, provided that userspace refrains from "writing them down" and then checking again after the filesystem in question has been unmounted and remounted.

    In short, on Linux, st_dev:st_ino is unique for all immediately accessible disk files, but is not guaranteed to remain stable across reboots or unmounting and remounting a filesystem.

     

    Overall, it looks like the best solution to my problem might be:

    Load File::Spec and then read @File::Spec::ISA to find which implementation it selected, or directly ask perl with File::Spec->isa('File::Spec::Unix'). If File::Spec::Unix was chosen, use the stat builtin and the "file tag" is join(':',(stat($filename))[0,1]), otherwise assume no links and the "file tag" is Cwd::abs_path($filename). Document the caveat and wait for a bug report from someone that actually managed to cause confusion by making links on a non-Unix-like system. (Preceding code is untested.)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11104558]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2020-07-12 16:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?