Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

I'd like to find a portable way to compare two paths and determine if one path is inside, outside, equal to, or along side of another. Assume that the paths in question are 'real paths' (not symbolic links).

If the only systems running Perl were Unix, Dos, or Mac this would be easy: all I would need to do is canonize the two paths and then compare them as strings. For example, if A is a substring of B, then A is the parent of B. If A and B have a common start, but a different end, then they are "along side of" each other.

However, the string comparison algorithm makes a crucial assumption that path segments to the right are always nested inside of path segments to the left. I am not certain this holds for all of the operating systems supported by Perl. There is no law of nature that says more nested items should be to the right (although it is consistent with the way we write numbers). perlport doesn't answer this question and my google foo is lacking. I would love it if someone could point me to documentation that would confirm or disprove this assumption.

If I can't assume left to right nesting, then I need to find another platform independent way of capturing which directories and files are nested within each other. One obvious solution is to use File::Spec to parse the path. This should allow me to produce a system independent version of the path. I can then use the same logic to compare two parsed paths regardless of operating system.

I would have expected that using File::Spec would work, but my attempt fails the following tests:

# Failed test 'comparePaths(/a/b, /a)' # in Monks/Snippet.pm at line 132. # got: '-2' # expected: '-1' Monks/Snippet....NOK 24 # Failed test 'comparePaths(/a/b, /a/)' # in Monks/Snippet.pm at line 133. # got: '-2' # expected: '-1' Monks/Snippet....NOK 25 # Failed test 'comparePaths(/a/b/, /a)' # in Monks/Snippet.pm at line 134. # got: '-2' # expected: '-1' Monks/Snippet....NOK 26 # Failed test 'comparePaths(/a/b/, /a/)' # in Monks/Snippet.pm at line 135. # got: '-2' # expected: '-1' # Looks like you failed 4 tests of 36.

The heart of the problem seems to be that File::Spec converts "/" (File::Spec->rootdir() on Unix) to directory path components ('','') but "/a" to ('','a'). Thus if we rely on the output of File::Spec alone, "/a" does not appear to be within "/"! Both / and /a are canonical forms according to File::Spec, so canonization isn't going to make this problem go away.

There are several possible solutions but all of them seem to me less than ideal except one - a bug in my own work that I didn't see. Failing that, my options are:

  • discard empty directory components. According to File::Spec empty directory components have meaning on some file systems. They can not be safely discarded without storing a list containing the OS's where they are significant. I'd happily store that list, but I don't know where to find it. From the documentation on splitdir:
    Unlike just splitting the directories on the separator, empty directory names ('') can be returned, because these are significant on some OSes.
  • use the "$no_file" parameter on splitpath. This gets the four tests that failed above to work. However, in their place four other tests fail:
    # Failed test 'comparePaths(/, /a)' # in Monks/Snippet.pm at line 108. # got: '-2' # expected: '1' # Failed test 'comparePaths(/, /a/)' # in Monks/Snippet.pm at line 109. # got: '-2' # expected: '1' # Failed test 'comparePaths(/a, /)' # in Monks/Snippet.pm at line 137. # got: '-2' # expected: '-1' # Failed test 'comparePaths(/a/, /)' # in Monks/Snippet.pm at line 138. # got: '-2' # expected: '-1' # Looks like you failed 4 tests of 36.
  • In parsePath below,
    #replace my @aDirs = File::Spec->splitdir($sDirPart); #with my @aDirs = $sDirPart eq File::Spec->rootdir() ? ('') : File::Spec->splitdir($sDirPart);

    With this change all tests pass both for $no_flag set to true and $no_flag set to false, at least on Unix, but it feels like a kludge and I'm not at all certain of its portability.

  • Just assume that all operating systems put subdirs and files to the right of the directories that contain them. Then I could just ignore File::Spec and call it a day. This assumption may well be true, but I'd feel a lot more comfortable relying on it if I had some documentation, which I don't.

The code and tests are posted below:

use strict; use warnings; use File::Spec; use Test::More tests => 36; #=============================================================== # comparePaths() and supporting functions #=============================================================== # returns # 1 if $sPath1 owns/contains $sPath2 # 0 if $sPath1 equals $sPath2 # -1 if $sPath1 is owned *by* $sPath2 # -2 if $sPath1 is along side of $sPath2 sub comparePaths { my ($sPath1, $sPath2) = @_; my ($sVol1, $aDirs1, $sFile1) = parsePath($sPath1); my ($sVol2, $aDirs2, $sFile2) = parsePath($sPath2); # paths on two different volumes can't own one another return -2 if ($sVol1 ne $sVol2); # assume the most deeply nested path components are at the # end of the directory array. # files are "inside" directories, so just push them onto the # directory path push @$aDirs1, $sFile1 if $sFile1; push @$aDirs2, $sFile2 if $sFile2; # $"='|'; #to make leading and trailing '' more visible # print STDERR "dirs1=<@$aDirs1> <@$aDirs2>\n"; # decide if we are inside or outside by comparing directory # components my $iSegments1 = scalar @$aDirs1; my $iSegments2 = scalar @$aDirs2; if ($iSegments1 <= $iSegments2) { for (my $i=0; $i < $iSegments1; $i++) { return -2 if $aDirs1->[$i] ne $aDirs2->[$i]; } return $iSegments1 == $iSegments2 ? 0 : 1; } else { for (my $i=0; $i < $iSegments2; $i++) { return -2 if $aDirs1->[$i] ne $aDirs2->[$i]; } return -1; } } sub parsePath { my $sPath = shift; # parse the canonical path $sPath = File::Spec->canonpath($sPath); # parse the canonical path $sPath = File::Spec->canonpath($sPath); # split the path into components my ($sVolume, $sDirPart, $sFilePart) = File::Spec->splitpath($sPath, 0); # maybe the nesting order of directories in $sDirPart # is right to left instead of left to right # (as in Unix,MsWin)? # If so, further split the directory portion into # components in the hope that splitdir produces # an array with most nested directory components at # the end... BUT this is an assumption. There is no # documentation guarenteeing it. # Also, canonize the directory part before splitting # it. File::Spec::Unix sets the directory part to '.../' # but splitdir doesn't strip empty directories from UNIX. # this is explained in File::Spec's documentation for splitdir: # # Unlike just splitting the directories on the separator, # empty directory names ('') can be returned, because these # are significant on some OSes. $sDirPart = File::Spec->canonpath($sDirPart); my @aDirs = File::Spec->splitdir($sDirPart); # return parsed path return ($sVolume, \@aDirs, $sFilePart); } #=============================================================== # TESTS #=============================================================== #inside root, relpath is(comparePaths('a/b', 'a/b/c'), 1, "comparePaths(a/b, a/b/c)"); is(comparePaths('a/b', 'a/b/c/'), 1, "comparePaths(a/b, a/b/c)"); is(comparePaths('a/b/', 'a/b/c'), 1 , "comparePaths(a/b, a/b/c)"); is(comparePaths('a/b/', 'a/b/c/'), 1, "comparePaths(a/b, a/b/c)"); #inside root, abspath is(comparePaths('/a/b', '/a/b/c'), 1, "comparePaths(/a/b, /a/b/c)"); is(comparePaths('/a/b', '/a/b/c/'), 1, "comparePaths(/a/b, /a/b/c)"); is(comparePaths('/a/b/', '/a/b/c'), 1, "comparePaths(/a/b, /a/b/c)"); is(comparePaths('/a/b/', '/a/b/c/'), 1, "comparePaths(/a/b, /a/b/c)"); is(comparePaths('/', '/a'), -1, "comparePaths(/, /a)"); is(comparePaths('/', '/a/'), -1, "comparePaths(/, /a/)"); #equal to root, relpath is(comparePaths('a/b', 'a/b'), 0, "comparePaths(a/b, a/b)"); is(comparePaths('a/b', 'a/b/'), 0, "comparePaths(a/b, a/b/)"); is(comparePaths('a/b/', 'a/b'), 0, "comparePaths(a/b/, a/b)"); is(comparePaths('a/b/', 'a/b/'), 0, "comparePaths(a/b/, a/b/)"); #equal to root, abspath is(comparePaths('/a/b', '/a/b'), 0, "comparePaths(/a/b, /a/b)"); is(comparePaths('/a/b', '/a/b/'), 0, "comparePaths(/a/b, /a/b/)"); is(comparePaths('/a/b/', '/a/b'), 0, "comparePaths(/a/b/, /a/b)"); is(comparePaths('/a/b/', '/a/b/'), 0, "comparePaths(/a/b/, /a/b/)"); #parent to root, relpath is(comparePaths('a/b', 'a'), -1, "comparePaths(a/b, a)"); is(comparePaths('a/b', 'a/'), -1, "comparePaths(a/b, a/)"); is(comparePaths('a/b/', 'a'), -1, "comparePaths(a/b/, a)"); is(comparePaths('a/b/', 'a/'), -1, "comparePaths(a/b/, a/)"); #parent to root, abspath is(comparePaths('/a/b', '/a'), -1, "comparePaths(/a/b, /a)"); is(comparePaths('/a/b', '/a/'), -1, "comparePaths(/a/b, /a/)"); is(comparePaths('/a/b/', '/a'), -1, "comparePaths(/a/b/, /a)"); is(comparePaths('/a/b/', '/a/'), -1, "comparePaths(/a/b/, /a/)"); is(comparePaths('/a', '/'), -1, "comparePaths(/a, /)"); is(comparePaths('/a/', '/'), -1, "comparePaths(/a/, /)"); #outside root, relpath is(comparePaths('a/b', 'a/x'), -2, "comparePaths(a/b, a/x)"); is(comparePaths('a/b', 'a/x/'), -2, "comparePaths(a/b, a/x/)"); is(comparePaths('a/b/', 'a/x'), -2, "comparePaths(a/b/, a/x)"); is(comparePaths('a/b/', 'a/x/'), -2, "comparePaths(a/b/, a/x/)"); #outside root, abspath is(comparePaths('/a/b', '/a/x'), -2, "comparePaths(/a/b, /a/x)"); is(comparePaths('/a/b', '/a/x/'), -2, "comparePaths(/a/b, /a/x/)"); is(comparePaths('/a/b/', '/a/x'), -2, "comparePaths(/a/b/, /a/x)"); is(comparePaths('/a/b/', '/a/x/'), -2, "comparePaths(/a/b/, /a/x/)");

As always many thanks in advance. And also a special thanks to the monks in the cb tonight (tye, ikegami, ssandv, belg4mit and derby) who all helped me clarify my problem and identify a specific use case that was causing problems. Without their help this would have been yet another XY Problem.

Best, beth

Update: narrowed scope of question to real paths in response to JavaFan's post below.

Update: added another possible solution, it works, but feels like a kludge.


In reply to How does one compare the relationship between two file system paths in a portable way? by ELISHEVA

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2024-04-25 19:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found