If the only systems running Perl were Unix, Dos, or Mac this would be easy: all I would need to do is canonize the two paths and then compare them as strings. For example, if A is a substring of B, then A is the parent of B. If A and B have a common start, but a different end, then they are "along side of" each other.
However, the string comparison algorithm makes a crucial assumption that path segments to the right are always nested inside of path segments to the left. I am not certain
this holds for all of the operating systems supported by
Perl. There is no law of nature that says more nested items
should be to the right (although it is consistent with the
way we write numbers). perlport doesn't answer this question and my google foo is lacking. I would love it if someone could point me to documentation that would confirm or disprove this assumption.
If I can't assume left to right nesting, then I need to find another platform independent way of capturing which directories and files are nested within each other. One obvious solution is to use File::Spec to parse the path. This should allow me to produce a system independent version of the path. I can then use the same logic to compare two parsed paths regardless of operating system.
# Failed test 'comparePaths(/a/b, /a)'
# in Monks/Snippet.pm at line 132.
# got: '-2'
# expected: '-1'
Monks/Snippet....NOK 24
# Failed test 'comparePaths(/a/b, /a/)'
# in Monks/Snippet.pm at line 133.
# got: '-2'
# expected: '-1'
Monks/Snippet....NOK 25
# Failed test 'comparePaths(/a/b/, /a)'
# in Monks/Snippet.pm at line 134.
# got: '-2'
# expected: '-1'
Monks/Snippet....NOK 26
# Failed test 'comparePaths(/a/b/, /a/)'
# in Monks/Snippet.pm at line 135.
# got: '-2'
# expected: '-1'
# Looks like you failed 4 tests of 36.
The heart of the problem seems to be that File::Spec converts "/" (File::Spec->rootdir() on Unix) to directory path components ('','') but "/a" to ('','a'). Thus if we rely on the
output of File::Spec alone, "/a" does not appear to be
within "/"! Both / and /a are canonical forms according to File::Spec, so canonization isn't going to make this problem go away.
There are several possible solutions but all of them seem to me less than ideal except one - a bug in my own work that I didn't see. Failing that, my options are:
- discard empty directory components. According to File::Spec empty directory components have meaning on some file systems. They can not be safely discarded without storing a list containing the OS's where they are significant. I'd happily store that list, but I don't know where to find it. From the documentation on splitdir:
Unlike just splitting the directories on the separator, empty directory names ('') can be returned, because these are significant on some OSes.
- use the "$no_file" parameter on splitpath. This gets the four tests that failed above to work. However, in their place four other tests fail:
# Failed test 'comparePaths(/, /a)'
# in Monks/Snippet.pm at line 108.
# got: '-2'
# expected: '1'
# Failed test 'comparePaths(/, /a/)'
# in Monks/Snippet.pm at line 109.
# got: '-2'
# expected: '1'
# Failed test 'comparePaths(/a, /)'
# in Monks/Snippet.pm at line 137.
# got: '-2'
# expected: '-1'
# Failed test 'comparePaths(/a/, /)'
# in Monks/Snippet.pm at line 138.
# got: '-2'
# expected: '-1'
# Looks like you failed 4 tests of 36.
- In parsePath below,
#replace
my @aDirs = File::Spec->splitdir($sDirPart);
#with
my @aDirs = $sDirPart eq File::Spec->rootdir()
? ('') : File::Spec->splitdir($sDirPart);
With this change all tests pass both for $no_flag set to true and $no_flag set to false, at least on Unix, but it feels like a kludge and I'm not at all certain of its portability.
- Just assume that all operating systems put subdirs and files to the right of the directories that contain them. Then I could just ignore File::Spec and call it a day. This assumption may well be true, but I'd feel a lot more comfortable relying on it if I had some documentation, which I don't.
The code and tests are posted below:
use strict;
use warnings;
use File::Spec;
use Test::More tests => 36;
#===============================================================
# comparePaths() and supporting functions
#===============================================================
# returns
# 1 if $sPath1 owns/contains $sPath2
# 0 if $sPath1 equals $sPath2
# -1 if $sPath1 is owned *by* $sPath2
# -2 if $sPath1 is along side of $sPath2
sub comparePaths {
my ($sPath1, $sPath2) = @_;
my ($sVol1, $aDirs1, $sFile1) = parsePath($sPath1);
my ($sVol2, $aDirs2, $sFile2) = parsePath($sPath2);
# paths on two different volumes can't own one another
return -2 if ($sVol1 ne $sVol2);
# assume the most deeply nested path components are at the
# end of the directory array.
# files are "inside" directories, so just push them onto the
# directory path
push @$aDirs1, $sFile1 if $sFile1;
push @$aDirs2, $sFile2 if $sFile2;
# $"='|'; #to make leading and trailing '' more visible
# print STDERR "dirs1=<@$aDirs1> <@$aDirs2>\n";
# decide if we are inside or outside by comparing directory
# components
my $iSegments1 = scalar @$aDirs1;
my $iSegments2 = scalar @$aDirs2;
if ($iSegments1 <= $iSegments2) {
for (my $i=0; $i < $iSegments1; $i++) {
return -2 if $aDirs1->[$i] ne $aDirs2->[$i];
}
return $iSegments1 == $iSegments2 ? 0 : 1;
} else {
for (my $i=0; $i < $iSegments2; $i++) {
return -2 if $aDirs1->[$i] ne $aDirs2->[$i];
}
return -1;
}
}
sub parsePath {
my $sPath = shift;
# parse the canonical path
$sPath = File::Spec->canonpath($sPath);
# parse the canonical path
$sPath = File::Spec->canonpath($sPath);
# split the path into components
my ($sVolume, $sDirPart, $sFilePart)
= File::Spec->splitpath($sPath, 0);
# maybe the nesting order of directories in $sDirPart
# is right to left instead of left to right
# (as in Unix,MsWin)?
# If so, further split the directory portion into
# components in the hope that splitdir produces
# an array with most nested directory components at
# the end... BUT this is an assumption. There is no
# documentation guarenteeing it.
# Also, canonize the directory part before splitting
# it. File::Spec::Unix sets the directory part to '.../'
# but splitdir doesn't strip empty directories from UNIX.
# this is explained in File::Spec's documentation for splitdir:
#
# Unlike just splitting the directories on the separator,
# empty directory names ('') can be returned, because these
# are significant on some OSes.
$sDirPart = File::Spec->canonpath($sDirPart);
my @aDirs = File::Spec->splitdir($sDirPart);
# return parsed path
return ($sVolume, \@aDirs, $sFilePart);
}
#===============================================================
# TESTS
#===============================================================
#inside root, relpath
is(comparePaths('a/b', 'a/b/c'), 1, "comparePaths(a/b, a/b/c)");
is(comparePaths('a/b', 'a/b/c/'), 1, "comparePaths(a/b, a/b/c)");
is(comparePaths('a/b/', 'a/b/c'), 1 , "comparePaths(a/b, a/b/c)");
is(comparePaths('a/b/', 'a/b/c/'), 1, "comparePaths(a/b, a/b/c)");
#inside root, abspath
is(comparePaths('/a/b', '/a/b/c'), 1, "comparePaths(/a/b, /a/b/c)");
is(comparePaths('/a/b', '/a/b/c/'), 1, "comparePaths(/a/b, /a/b/c)");
is(comparePaths('/a/b/', '/a/b/c'), 1, "comparePaths(/a/b, /a/b/c)");
is(comparePaths('/a/b/', '/a/b/c/'), 1, "comparePaths(/a/b, /a/b/c)");
is(comparePaths('/', '/a'), -1, "comparePaths(/, /a)");
is(comparePaths('/', '/a/'), -1, "comparePaths(/, /a/)");
#equal to root, relpath
is(comparePaths('a/b', 'a/b'), 0, "comparePaths(a/b, a/b)");
is(comparePaths('a/b', 'a/b/'), 0, "comparePaths(a/b, a/b/)");
is(comparePaths('a/b/', 'a/b'), 0, "comparePaths(a/b/, a/b)");
is(comparePaths('a/b/', 'a/b/'), 0, "comparePaths(a/b/, a/b/)");
#equal to root, abspath
is(comparePaths('/a/b', '/a/b'), 0, "comparePaths(/a/b, /a/b)");
is(comparePaths('/a/b', '/a/b/'), 0, "comparePaths(/a/b, /a/b/)");
is(comparePaths('/a/b/', '/a/b'), 0, "comparePaths(/a/b/, /a/b)");
is(comparePaths('/a/b/', '/a/b/'), 0, "comparePaths(/a/b/, /a/b/)");
#parent to root, relpath
is(comparePaths('a/b', 'a'), -1, "comparePaths(a/b, a)");
is(comparePaths('a/b', 'a/'), -1, "comparePaths(a/b, a/)");
is(comparePaths('a/b/', 'a'), -1, "comparePaths(a/b/, a)");
is(comparePaths('a/b/', 'a/'), -1, "comparePaths(a/b/, a/)");
#parent to root, abspath
is(comparePaths('/a/b', '/a'), -1, "comparePaths(/a/b, /a)");
is(comparePaths('/a/b', '/a/'), -1, "comparePaths(/a/b, /a/)");
is(comparePaths('/a/b/', '/a'), -1, "comparePaths(/a/b/, /a)");
is(comparePaths('/a/b/', '/a/'), -1, "comparePaths(/a/b/, /a/)");
is(comparePaths('/a', '/'), -1, "comparePaths(/a, /)");
is(comparePaths('/a/', '/'), -1, "comparePaths(/a/, /)");
#outside root, relpath
is(comparePaths('a/b', 'a/x'), -2, "comparePaths(a/b, a/x)");
is(comparePaths('a/b', 'a/x/'), -2, "comparePaths(a/b, a/x/)");
is(comparePaths('a/b/', 'a/x'), -2, "comparePaths(a/b/, a/x)");
is(comparePaths('a/b/', 'a/x/'), -2, "comparePaths(a/b/, a/x/)");
#outside root, abspath
is(comparePaths('/a/b', '/a/x'), -2, "comparePaths(/a/b, /a/x)");
is(comparePaths('/a/b', '/a/x/'), -2, "comparePaths(/a/b, /a/x/)");
is(comparePaths('/a/b/', '/a/x'), -2, "comparePaths(/a/b/, /a/x)");
is(comparePaths('/a/b/', '/a/x/'), -2, "comparePaths(/a/b/, /a/x/)");
As always many thanks in advance. And also a special thanks to the monks in the cb tonight (tye, ikegami, ssandv, belg4mit and derby) who all helped me clarify my problem and identify a specific use case that was causing problems. Without their help this would have been yet another XY Problem.