http://qs321.pair.com?node_id=11102811

This is the long version of Re^2: open file error and Re^5: Unify windows filenames (from ten years ago), triggered by Re^2: perl script to compare two directories, plus a little bit of bean counting on the code for non-Unix systems.


File::Spec has a severe design problem that will cause wrong results on modern Unix systems (*BSD, Linux, MacOS X). It assumes that all filesystems are equal on any Unix operating system. And this assumtion is plain wrong since decades:

To make things worse, the code for at least Windows and cygwin is broken, too.

And here is how File::Spec handles all of that:

package File::Spec; use strict; our $VERSION = '3.75'; $VERSION =~ tr/_//d; my %module = ( MSWin32 => 'Win32', os2 => 'OS2', VMS => 'VMS', NetWare => 'Win32', # Yes, File::Spec::Win32 works on Ne +tWare. symbian => 'Win32', # Yes, File::Spec::Win32 works on sy +mbian. dos => 'OS2', # Yes, File::Spec::OS2 works on DJGP +P. cygwin => 'Cygwin', amigaos => 'AmigaOS'); my $module = $module{$^O} || 'Unix'; require "File/Spec/$module.pm"; our @ISA = ("File::Spec::$module"); 1;

For any modern Unix, the %module hash has no overriding key, and so File::Spec::Unix will implement all methods of File::Spec.

File::Spec::Unix

package File::Spec::Unix; use strict; use Cwd (); our $VERSION = '3.75'; $VERSION =~ tr/_//d; # ... sub case_tolerant { 0 } use constant _fn_case_tolerant => 0; # ...

File::Spec->case_tolerant() constantly returns false, and that is plain wrong, as explained above.

The constant _fn_case_tolerant, used by File::Spec::Functions, is also wrong. It should not exist at all.

File::Spec::Win32

Strangely, the Windows implementation, which optionally allows passing a drive letter to the case_tolerant() method, is better, but still wrong:

package File::Spec::Win32; use strict; use Cwd (); require File::Spec::Unix; our $VERSION = '3.75'; $VERSION =~ tr/_//d; our @ISA = qw(File::Spec::Unix); # ... sub case_tolerant { eval { local @INC = @INC; pop @INC if $INC[-1] eq '.'; require Win32API::File; } or return 1; my $drive = shift || "C:"; my $osFsType = "\0"x256; my $osVolName = "\0"x256; my $ouFsFlags = 0; Win32API::File::GetVolumeInformation($drive, $osVolName, 256, [], [] +, $ouFsFlags, $osFsType, 256 ); if ($ouFsFlags & Win32API::File::FS_CASE_SENSITIVE()) { return 0; } else { return 1; } } # ...
  • Failing to load Win32API::File silently and magically makes all filesystems case insensitive. Plain wrong. If all filesystems were case insensitive, messing with the Win32 API function GetVolumeInformation() would not be needed and the entire function body could be reduced to { 1 }. Right for the common case, wrong for edge cases.
  • The return value of GetVolumeInformation(), which may fail, is not checked at all. Microsoft does not specify what happens to the variable passed as lpFileSystemFlags, so the contents of $ouFsFlags may be junk in that case. If you are lucky, $ouFsFlags is not touched, stays 0, and the following test for FS_CASE_SENSITIVE ends by returning true, which is right by accident in the common case. If you test an edge case, or if $ouFsFlags has been modified so that its FS_CASE_SENSITIVE bit is set, the returned result is wrong.
  • Omitting the drive letter tests the C: drive, probably to be compatible with the File::Spec::Unix implementation, which has no documented parameters. Again, right for the common case, wrong for edge cases.
  • Undocumented feature: GetVolumeInformation() also accepts UNC paths ('\\server\share'), and case_tolerant() does not prevent you from testing UNC paths.
  • NTFS volume mount points (that allow mounting any filesystem supported by Windows in an empty subdirectory of a NTFS volume) are not documented, neither in File::Spec::Win32, nor in the documentation of GetVolumeInformation(). Probably, some more code is required to handle NTFS volume mount points.

File::Spec::Cygwin

package File::Spec::Cygwin; use strict; require File::Spec::Unix; our $VERSION = '3.75'; $VERSION =~ tr/_//d; our @ISA = qw(File::Spec::Unix); # ... sub case_tolerant { return 1 unless $^O eq 'cygwin' and defined &Cygwin::mount_flags; my $drive = shift; if (! $drive) { my @flags = split(/,/, Cygwin::mount_flags('/cygwin')); my $prefix = pop(@flags); if (! $prefix || $prefix eq 'cygdrive') { $drive = '/cygdrive/c'; } elsif ($prefix eq '/') { $drive = '/c'; } else { $drive = "$prefix/c"; } } my $mntopts = Cygwin::mount_flags($drive); if ($mntopts and ($mntopts =~ /,managed/)) { return 0; } eval { local @INC = @INC; pop @INC if $INC[-1] eq '.'; require Win32API::File; } or return 1; my $osFsType = "\0"x256; my $osVolName = "\0"x256; my $ouFsFlags = 0; Win32API::File::GetVolumeInformation($drive, $osVolName, 256, [], [] +, $ouFsFlags, $osFsType, 256 ); if ($ouFsFlags & Win32API::File::FS_CASE_SENSITIVE()) { return 0; } else { return 1; } } # ...
  • This code inherits all problems of File::Spec::Win32 by copying the broken code from File::Spec::Win32::case_tolerant(), and extending it with its own code.
  • Missing Cygwin::mount_flags() silently and magically makes all filesystems case insensitive, even before all problems copied from File::Spec::Win32::case_tolerant() can appear.
  • Any false value (undef, empty string, 0) passed as argumnent, including passing no argument at all, triggers the default logic that creates a Unix-style, cygwin-specific path in $drive. If that path is not detected as "managed" (by cygwin), this Unix-style path will be passed as a drive letter to GetVolumeInformation(). I expect GetVolumeInformation() to fail in that case. As in File::Spec::Win32::case_tolerant(), there is no error check, see there.

File::Spec::*

File::Spec::AmigaOS inherits case_tolerant() from File::Spec::Unix, so it returns false. I have no clue how AmigaOS handles file names. AmigaOS states that device names are case insensitive, so case_tolerant() should return true for device names. IIRC, old Amigas could at least read DOS floppies. Again, case_tolerant() should return true for DOS floppies. So, it looks like File::Spec::AmigaOS::case_tolerant() is broken.

File::Spec::Epoc implements case_tolerant() to constantly return true. Again, I have no clue how EPOC a.k.a. Symbian OS handles file names.

File::Spec::Mac implements case_tolerant() to constantly return true. This should be ok, the filesystems of classic MacOS (MFS, HFS, HFS+) are case insensitive, as are FAT-formatted floppies from DOS. I don't know if old Macs could access any other filesystems.

File::Spec::VMS implements case_tolerant() to constantly return true. Again, no clue how VMS handles file names, or if it can mount case-sensitive filesystems. Files-11 suggest case insensitive behaviour. If case-sensitive filesystems can be mounted, this implementation is broken.

File::Spec::OS2, used also for DOS, implements case_tolerant() to constantly return true. This should be ok for DOS and OS/2 native filesystems (FAT, HPFS). I don't know if later versions of OS/2 support case-sensitive filesystems. If they do, this implementation is broken.

File::Spec::Functions

package File::Spec::Functions; # ... our $VERSION = '3.75'; $VERSION =~ tr/_//d; #... my %udeps = ( # ... case_tolerant => [], # ... ); foreach my $meth (@EXPORT, @EXPORT_OK) { my $sub = File::Spec->can($meth); no strict 'refs'; if (exists($udeps{$meth}) && $sub == File::Spec::Unix->can($meth) +&& !(grep { File::Spec->can($_) != File::Spec::Unix->can($_) } @{$udeps{$meth}}) && defined(&{"File::Spec::Unix::_fn_$meth"})) { *{$meth} = \&{"File::Spec::Unix::_fn_$meth"}; } else { *{$meth} = sub {&$sub('File::Spec', @_)}; } } # ...
  • File::Spec::Functions generates function wrappers for the methods of File::Spec.
  • The generic way is a simple function that injects 'File::Spec' as first argument for the method call.
  • For methods that are implemented by File::Spec::Unix, are listed in %udeps, don't have dependencies (from %udeps) implemented in other classes, and have a function named "_fn_$meth" defined in File::Spec::Unix, that function is used instead. The intention is clear: All of those functions in File::Spec::Unix are implemented via constant, and using them saves several CPU cycles compared to wrapping a method. case_tolerant() is one of those methods, and File::Spec::Functions will choose File::Spec::Unix::_fn_case_tolerant() instead of generating a wrapper if the O/S-specific class does not implement its own case_tolerant() method.
  • File::Spec::Functions makes no attempt to speed up constant methods in O/S-specific classes.
  • The return value of File::Spec::Unix::_fn_case_tolerant() is wrong, as explained above.
  • Repairing File::Spec::Unix->case_tolerant() implies that File::Spec::Unix::_fn_case_tolerant() has to be removed, see below.

Fixing case_tolerant() for Unix

There seems to be no easy fix for Unix. Returning false is the wrong answer for all of the edge case shown above. Returning true is the wrong answer for all of the common cases (native filesystems).

At least, case_tolerant() needs a path to work on, as there is no generic answer on Unix. The path should be the equivalent of a drive letter on windows, i.e. a mount point.

For convenience, passing a filename should be treated like passing the directory containing it. (This should happen for all operating systems)

Also for convenience, passing a directory that is not a mount point should be treated like passing the next mount point upwards the directory tree. (This should also happen for all operating systems.)

Detecting a mount point depends on the operating system, but a general solution for Unix exists: compare the dev fields of stat($dir) and lstat("$dir/.."), if they differ, you found a mount point. Note that this method fails to detect bind mounts on Linux (according to mountpoint.c from util-linux).

Knowing the mount point for a filesystem, you "only" have to find out if that filesystem is mounted case-sensitive or case-insensitive. And that depends on the operating system.

For bug-compatibility to the existing File::Spec::Unix, calling case_tolerant() without arguments could continue to return false.

Fixing case_tolerant() for Windows

case_tolerant() should also accept directories and files deep inside a volume, as required for the fixed File::Spec::Unix->case_tolerant()

As far as GetVolumeInformation() is concerned, mount points are either drive letters or server shares. That can easily be handled by a regexp.

Detecting NTFS volume mount points needs more work. See https://docs.microsoft.com/en-us/windows/win32/fileio/volume-mount-points.

GetVolumeInformationByHandleW() looks promising, but requires at least Vista / Server 2008, and it requires that you pass a file handle, not just a path.

Fixing case_tolerant() for Cygwin

case_tolerant() should also accept directories and files deep inside a volume, as required for the fixed File::Spec::Unix->case_tolerant()

  1. Check if the argument looks like a cygwin path
  2. If it does, find the cygwin mount point for the argument and check the mount options.
  3. Else, load File::Spec::Win32, pass the argument to File::Spec::Win32->case_tolerant() and return whatever that method returns

Especially, do not check for Cygwin or Cycwin functions before you know you have to work with a cygwin path.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)