Portable way to determine if two names refer to the same file?


good chemistry is complicated, and a little bit messy -LW
	PerlMonks

Portable way to determine if two names refer to the same file?

by jcb (Parson)

on Aug 16, 2019 at 03:31 UTC ( [id://11104544]=perlquestion: print w/replies, xml )

Need Help??

jcb has asked for the wisdom of the Perl Monks concerning the following question:

I have spent the past hour or so trying Super Search and not finding a clear answer, so I ask my fellow monks how best to portably determine if two seemingly different names actually refer to the same physical file?

I am not concerned about copies of the same file, only links, such that the same physical file appears under multiple names.

On POSIX, the solution is easy: compare dev:ino tuples from the stat builtin and declare "same file" if they match. I have no idea if this also works on Windows or even if the problem exists on Windows — how well does Windows handle symlinks anyway and does it even support hardlinks at all?

And what of the less-common platforms?

Cross-posted in Categorized Questions and Answers at How do I portably determine if two filenames refer to the same file? as a place to collect answers for future reference.

Edited 2019-08-16 by jcb: Clarify that stat refers to the Perl builtin. I had forgotten about the shell command with the same name.

Comment on Portable way to determine if two names refer to the same file? Select or Download Code

Replies are listed 'Best First'.
Re: Portable way to determine if two names refer to the same file? by LanX (Saint) on Aug 16, 2019 at 07:03 UTC
Did you try Perl's own `stat` ? You seem to be talking about a shell command. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery FootballPerl is like chess, only without the dice} Update Ok win doesn't support i node perlport#stat	[reply]
Re^2: Portable way to determine if two names refer to the same file? by soonix (Canon) on Aug 16, 2019 at 08:48 UTC
Ok win doesn't support i node It doesn't, but for the prevailing NTFS, there's a FileID, which seems to be reliable, see the discussion in Anyway to determine path being monitored with Win32::ChangeNotify?. On the other side, I'd assume Linux' stat's inode to be of no value for FAT file systems…	[reply]
Re^3: Portable way to determine if two names refer to the same file? by jcb (Parson) on Aug 17, 2019 at 05:55 UTC
for the prevailing NTFS, there's a FileID, which seems to be reliable In theory, NTFS Master File Table (`$MFT`) record numbers *are* NTFS inode numbers, although I suspect that defragmentation tools might be able to sort the MFT, which would make them unstable, but still usable for comparing two files. Of course, Microsoft being Microsoft, a bit of research quickly uncovered at least two different API calls for handling this, one of which is new^!shiny! in Windows Server 2012 — and apparently is only found in Windows Server and may or may not actually work on all files or may only work on files in ReFS volumes, whatever the hell those are. General 64-bit FileID ReFS special 128-bit FileID While the 64-bit FileID is not guaranteed to be stable on FAT, FAT does not support links of any type, so simply comparing absolute filenames will work. Microsoft claims that a VSN:FileID tuple uniquely identifies a file. GNU claims that a st_dev:st_ino tuple uniquely identifies a file. On POSIX systems, device numbers are guaranteed to uniquely identify mounted filesystems, since a device number is the "access path identifier" for a mounted filesystem, but are not guaranteed to remain stable over time. On Windows, the analogous value seems to be the "volume serial number", which is stable across time because it is in the volume header, but its uniqueness is simply assumed and it has no role whatsoever in actually mapping I/O to the underlying storage. I wonder what happens if a Windows box is presented with two disks with the same volume serial number and different contents? Back to the point, how to get that VSN:FileID tuple in Perl? On the other side, I'd assume Linux' stat's inode to be of no value for FAT file systems… Oddly enough, if I understand the kernel sources correctly, the inode number has no meaning in terms of the actual filesystem, but is consistent with the rule that only the same file has the same inode number. This is a trick the kernel plays by keeping track of every inode that anyone is "looking at" and ensuring that each file in the dcache from a FAT filesystem has a unique inode number within that filesystem (or possibly system-wide: I am not entirely certain whether that table is per-filesystem or global). Since any way of examining a file in Linux creates a dcache entry that persists until either the filesystem is unmounted or the kernel recycles the memory, the kernel is able to maintain the illusion that FAT files have stable inode numbers, provided that userspace refrains from "writing them down" and then checking again after the filesystem in question has been unmounted and remounted. In short, on Linux, st_dev:st_ino is unique for all immediately accessible disk files, but is not guaranteed to remain stable across reboots or unmounting and remounting a filesystem. Overall, it looks like the best solution to my problem might be: Load File::Spec and then read `@File::Spec::ISA` to find which implementation it selected, or directly ask perl with `File::Spec->isa('File::Spec::Unix')`. If `File::Spec::Unix` was chosen, use the `stat` builtin and the "file tag" is `join(':',(stat($filename))[0,1])`, otherwise assume no links and the "file tag" is `Cwd::abs_path($filename)`. Document the caveat and wait for a bug report from someone that actually managed to cause confusion by making links on a non-Unix-like system. (Preceding code is untested.)	[reply] [d/l] [select]
Re^2: Portable way to determine if two names refer to the same file? by jcb (Parson) on Aug 17, 2019 at 04:25 UTC
The question was intended to refer to the Perl `stat` builtin and has been edited to clarify.	[reply] [d/l]
Re: Portable way to determine if two names refer to the same file? by Anonymous Monk on Aug 19, 2019 at 00:14 UTC
If one considers some of the filesystems that are now available in support of containers, and maybe even network filesystems such as NFS, a test of inodes will not work. I am not sure that there exists a 100%-certain way to do this that will work everywhere.	[reply]
Re^2: Portable way to determine if two names refer to the same file? by jcb (Parson) on Aug 19, 2019 at 00:55 UTC
If the system claims POSIX conformance and testing dev:ino is unreliable, the system is defective: The `st_ino` and `st_dev` fields taken together uniquely identify the file within the system. Note that `st_dev` must be unique within a Local Area Network (LAN) in a ``system'' made up of multiple computers' file systems connected by a LAN. Networked implementations of a POSIX-conforming system must guarantee that all files visible within the file tree (including parts of the tree that may be remotely mounted from other machines on the network) on each individual processor are uniquely identified by the combination of the `st_ino` and `st_dev` fields. — Above quotes from https://pubs.opengroup.org/onlinepubs/009695399/basedefs/sys/stat.h.html I would be very surprised if traditional Unix did not provide this guarantee, so I am fairly sure that all current "unix" platforms will meet it. If some container tool causes this to be violated, that tool is defective, end of story. There is a very strong expectation that modern "*nix" means POSIX.	[reply] [d/l] [select]
Re^2: Portable way to determine if two names refer to the same file? by afoken (Chancellor) on Aug 19, 2019 at 06:14 UTC
See also Detecting whether two pathes refer to the same file Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply]
Re^2: Portable way to determine if two names refer to the same file? by Anonymous Monk on Aug 20, 2019 at 14:45 UTC
You know this based on personal experience? You wrote a program which attempted to test equality of files using inodes and found that the test doesn't work when one or more of the files are remote? Didn't think so. So you're basing it on other research you've read, where someone else did that test? Can you link us to at least one such research report? Didn't think so. As usual.... you are full of hot air.	[reply]

Back to Seekers of Perl Wisdom

Log In^?

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: perlquestion [id://11104544]
Front-paged by haukex
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others rifling through the Monastery: (4)

As of 2024-04-25 07:22 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found