How does one compare the relationship between two file system paths in a portable way?

ELISHEVA has asked for the wisdom of the Perl Monks concerning the following question:

I'd like to find a portable way to compare two paths and determine if one path is inside, outside, equal to, or along side of another. Assume that the paths in question are 'real paths' (not symbolic links).

If the only systems running Perl were Unix, Dos, or Mac this would be easy: all I would need to do is canonize the two paths and then compare them as strings. For example, if A is a substring of B, then A is the parent of B. If A and B have a common start, but a different end, then they are "along side of" each other.

However, the string comparison algorithm makes a crucial assumption that path segments to the right are always nested inside of path segments to the left. I am not certain this holds for all of the operating systems supported by Perl. There is no law of nature that says more nested items should be to the right (although it is consistent with the way we write numbers). perlport doesn't answer this question and my google foo is lacking. I would love it if someone could point me to documentation that would confirm or disprove this assumption.

If I can't assume left to right nesting, then I need to find another platform independent way of capturing which directories and files are nested within each other. One obvious solution is to use File::Spec to parse the path. This should allow me to produce a system independent version of the path. I can then use the same logic to compare two parsed paths regardless of operating system.

I would have expected that using File::Spec would work, but my attempt fails the following tests:

#   Failed test 'comparePaths(/a/b, /a)'
#   in Monks/Snippet.pm at line 132.
#          got: '-2'
#     expected: '-1'

Monks/Snippet....NOK 24
#   Failed test 'comparePaths(/a/b, /a/)'
#   in Monks/Snippet.pm at line 133.
#          got: '-2'
#     expected: '-1'

Monks/Snippet....NOK 25
#   Failed test 'comparePaths(/a/b/, /a)'
#   in Monks/Snippet.pm at line 134.
#          got: '-2'
#     expected: '-1'

Monks/Snippet....NOK 26
#   Failed test 'comparePaths(/a/b/, /a/)'
#   in Monks/Snippet.pm at line 135.
#          got: '-2'
#     expected: '-1'
# Looks like you failed 4 tests of 36.
[download]

The heart of the problem seems to be that File::Spec converts "/" (File::Spec->rootdir() on Unix) to directory path components ('','') but "/a" to ('','a'). Thus if we rely on the output of File::Spec alone, "/a" does not appear to be within "/"! Both / and /a are canonical forms according to File::Spec, so canonization isn't going to make this problem go away.

There are several possible solutions but all of them seem to me less than ideal except one - a bug in my own work that I didn't see. Failing that, my options are:

discard empty directory components. According to File::Spec empty directory components have meaning on some file systems. They can not be safely discarded without storing a list containing the OS's where they are significant. I'd happily store that list, but I don't know where to find it. From the documentation on splitdir:
Unlike just splitting the directories on the separator, empty directory names ('') can be returned, because these are significant on some OSes.

use the "$no_file" parameter on splitpath. This gets the four tests that failed above to work. However, in their place four other tests fail:

#   Failed test 'comparePaths(/, /a)'
#   in Monks/Snippet.pm at line 108.
#          got: '-2'
#     expected: '1'

#   Failed test 'comparePaths(/, /a/)'
#   in Monks/Snippet.pm at line 109.
#          got: '-2'
#     expected: '1'

#   Failed test 'comparePaths(/a, /)'
#   in Monks/Snippet.pm at line 137.
#          got: '-2'
#     expected: '-1'

#   Failed test 'comparePaths(/a/, /)'
#   in Monks/Snippet.pm at line 138.
#          got: '-2'
#     expected: '-1'
# Looks like you failed 4 tests of 36.
[download]

In parsePath below,
```
    #replace
    my @aDirs = File::Spec->splitdir($sDirPart);
    #with
    my @aDirs = $sDirPart eq File::Spec->rootdir()
      ? ('') : File::Spec->splitdir($sDirPart);
[download]
```
With this change all tests pass both for $no_flag set to true and $no_flag set to false, at least on Unix, but it feels like a kludge and I'm not at all certain of its portability.
Just assume that all operating systems put subdirs and files to the right of the directories that contain them. Then I could just ignore File::Spec and call it a day. This assumption may well be true, but I'd feel a lot more comfortable relying on it if I had some documentation, which I don't.

The code and tests are posted below:

use strict;
use warnings;
use File::Spec;
use Test::More tests => 36;

#===============================================================
# comparePaths() and supporting functions 
#===============================================================

# returns
# 1  if $sPath1 owns/contains $sPath2
# 0  if $sPath1 equals $sPath2
# -1 if $sPath1 is owned *by* $sPath2
# -2 if $sPath1 is along side of $sPath2

sub comparePaths {
  my ($sPath1, $sPath2) = @_;
  my ($sVol1, $aDirs1, $sFile1) = parsePath($sPath1);
  my ($sVol2, $aDirs2, $sFile2) = parsePath($sPath2);

  # paths on two different volumes can't own one another
  return -2 if ($sVol1 ne $sVol2);

  # assume the most deeply nested path components are at the
  # end of the directory array.
  # files are "inside" directories, so just push them onto the
  # directory path
  push @$aDirs1, $sFile1 if $sFile1;
  push @$aDirs2, $sFile2 if $sFile2;

  # $"='|'; #to make leading and trailing '' more visible
  # print STDERR "dirs1=<@$aDirs1> <@$aDirs2>\n";

  # decide if we are inside or outside by comparing directory
  # components

  my $iSegments1 = scalar @$aDirs1;
  my $iSegments2 = scalar @$aDirs2;

  if ($iSegments1 <= $iSegments2) {
    for (my $i=0; $i < $iSegments1; $i++) {
      return -2 if $aDirs1->[$i] ne $aDirs2->[$i];
    }
    return $iSegments1 == $iSegments2 ? 0 : 1;
  } else {
    for (my $i=0; $i < $iSegments2; $i++) {
      return -2 if $aDirs1->[$i] ne $aDirs2->[$i];
    }
    return -1;
  }
}

sub parsePath {
  my $sPath = shift;

  # parse the canonical path
  $sPath = File::Spec->canonpath($sPath);

  # parse the canonical path
  $sPath = File::Spec->canonpath($sPath);

  # split the path into components
  my ($sVolume, $sDirPart, $sFilePart)
    = File::Spec->splitpath($sPath, 0);

  # maybe the nesting order of directories in $sDirPart
  # is right to left instead of left to right
  # (as in Unix,MsWin)?

  # If so, further split the directory portion into
  # components in the hope that splitdir produces
  # an array with most nested directory components at
  # the end... BUT this is an assumption. There is no
  # documentation guarenteeing it.

  # Also, canonize the directory part before splitting
  # it.  File::Spec::Unix sets the directory part to '.../'
  # but splitdir doesn't strip empty directories from UNIX.
  # this is explained in File::Spec's documentation for splitdir:
  #
  #   Unlike just splitting the directories on the separator,
  #   empty directory names ('') can be returned, because these
  #   are significant on some OSes.

  $sDirPart = File::Spec->canonpath($sDirPart);
  my @aDirs = File::Spec->splitdir($sDirPart);

  # return parsed path
  return ($sVolume, \@aDirs, $sFilePart);
}

#===============================================================
# TESTS
#===============================================================

  #inside root, relpath
is(comparePaths('a/b', 'a/b/c'), 1, "comparePaths(a/b, a/b/c)");
is(comparePaths('a/b', 'a/b/c/'), 1, "comparePaths(a/b, a/b/c)");
is(comparePaths('a/b/', 'a/b/c'), 1 , "comparePaths(a/b, a/b/c)");
is(comparePaths('a/b/', 'a/b/c/'), 1, "comparePaths(a/b, a/b/c)");

#inside root, abspath
is(comparePaths('/a/b', '/a/b/c'), 1, "comparePaths(/a/b, /a/b/c)");
is(comparePaths('/a/b', '/a/b/c/'), 1, "comparePaths(/a/b, /a/b/c)");
is(comparePaths('/a/b/', '/a/b/c'), 1, "comparePaths(/a/b, /a/b/c)");
is(comparePaths('/a/b/', '/a/b/c/'), 1, "comparePaths(/a/b, /a/b/c)");

is(comparePaths('/', '/a'), -1, "comparePaths(/, /a)");
is(comparePaths('/', '/a/'), -1, "comparePaths(/, /a/)");


#equal to root, relpath
is(comparePaths('a/b', 'a/b'), 0, "comparePaths(a/b, a/b)");
is(comparePaths('a/b', 'a/b/'), 0, "comparePaths(a/b, a/b/)");
is(comparePaths('a/b/', 'a/b'), 0, "comparePaths(a/b/, a/b)");
is(comparePaths('a/b/', 'a/b/'), 0, "comparePaths(a/b/, a/b/)");

#equal to root, abspath
is(comparePaths('/a/b', '/a/b'), 0, "comparePaths(/a/b, /a/b)");
is(comparePaths('/a/b', '/a/b/'), 0, "comparePaths(/a/b, /a/b/)");
is(comparePaths('/a/b/', '/a/b'), 0, "comparePaths(/a/b/, /a/b)");
is(comparePaths('/a/b/', '/a/b/'), 0, "comparePaths(/a/b/, /a/b/)");


#parent to root, relpath
is(comparePaths('a/b', 'a'), -1, "comparePaths(a/b, a)");
is(comparePaths('a/b', 'a/'), -1, "comparePaths(a/b, a/)");
is(comparePaths('a/b/', 'a'), -1, "comparePaths(a/b/, a)");
is(comparePaths('a/b/', 'a/'), -1, "comparePaths(a/b/, a/)");

#parent to root, abspath
is(comparePaths('/a/b', '/a'), -1, "comparePaths(/a/b, /a)");
is(comparePaths('/a/b', '/a/'), -1, "comparePaths(/a/b, /a/)");
is(comparePaths('/a/b/', '/a'), -1, "comparePaths(/a/b/, /a)");
is(comparePaths('/a/b/', '/a/'), -1, "comparePaths(/a/b/, /a/)");

is(comparePaths('/a', '/'), -1, "comparePaths(/a, /)");
is(comparePaths('/a/', '/'), -1, "comparePaths(/a/, /)");

#outside root, relpath
is(comparePaths('a/b', 'a/x'), -2, "comparePaths(a/b, a/x)");
is(comparePaths('a/b', 'a/x/'), -2, "comparePaths(a/b, a/x/)");
is(comparePaths('a/b/', 'a/x'), -2, "comparePaths(a/b/, a/x)");
is(comparePaths('a/b/', 'a/x/'), -2, "comparePaths(a/b/, a/x/)");

#outside root, abspath
is(comparePaths('/a/b', '/a/x'), -2, "comparePaths(/a/b, /a/x)");
is(comparePaths('/a/b', '/a/x/'), -2, "comparePaths(/a/b, /a/x/)");
is(comparePaths('/a/b/', '/a/x'), -2, "comparePaths(/a/b/, /a/x)");
is(comparePaths('/a/b/', '/a/x/'), -2, "comparePaths(/a/b/, /a/x/)");
[download]

As always many thanks in advance. And also a special thanks to the monks in the cb tonight (tye, ikegami, ssandv, belg4mit and derby) who all helped me clarify my problem and identify a specific use case that was causing problems. Without their help this would have been yet another XY Problem.

Best, beth

Update: narrowed scope of question to real paths in response to JavaFan's post below.

Update: added another possible solution, it works, but feels like a kludge.

Comment on How does one compare the relationship between two file system paths in a portable way? Select or Download Code

Replies are listed 'Best First'.
Re: How does compare the relationship between two file system paths in a portable way? by JavaFan (Canon) on Aug 17, 2009 at 22:55 UTC
If the only systems running Perl were Unix, Dos, or Mac this would be easy: all I would need to do is canonize the two paths and then compare them as strings. Considering that Unix allows symbolic links, and mounts anywhere on the current filesystem (not to mention the fact that things can be mounted more than once), this is not as easy as you may think. For example, if A is a substring of B, then A is the parent of B. Yes, but the reverse isn't true. If A isn't a substring of B, then that doesn't mean A isn't the parent of B.	[reply]
Re^2: How does compare the relationship between two file system paths in a portable way? by ELISHEVA (Prior) on Aug 17, 2009 at 23:14 UTC
Agreed. I should have been more precise in my scoping of the problem (and have updated my original post accordingly). My particular use cases for this routine involves paths that can be presumed to be real file names. But others interested in a "comparePath" function might have different goals. Even in cases where that assumption can't be made, Perl offers, what I presume is, a portable way to identify symbolic links (-l, for example) so those could easily be filtered out and removed from consideration. Or alternatively one could use a function like readlink to convert the link to its real name. Do you have experiences where `readlink` and `-l` have caused problems? Come to think of it, I don't know how to detect mounts. Is `-l` also used? Best, beth	[reply] [d/l] [select]
Re^3: How does compare the relationship between two file system paths in a portable way? by JavaFan (Canon) on Aug 17, 2009 at 23:43 UTC
As long as you remember to do a readlink on all segments of the path, then it may work. As long as there aren't any loops. Loops can make two paths be parents of each other. Note however that it's not forbidden for a Unix filesystem to allow directory hardlinks. Not sure if Perl runs on any system that has a filesystem where that is allowed. As for detecting mounts, `stat` tells you the device number of a file or directory. I don't know how to get, from Perl and in a portable way, a list of mount points.	[reply]
Re^4: How does compare the relationship between two file system paths in a portable way? by quester (Vicar) on Aug 18, 2009 at 04:29 UTC
Re^5: How does compare the relationship between two file system paths in a portable way? by JavaFan (Canon) on Aug 18, 2009 at 09:25 UTC


We don't bite newbies here... much
	PerlMonks