Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^5: Getting all subpaths from a path

by haukex (Archbishop)
on Apr 02, 2021 at 18:33 UTC ( [id://11130730]=note: print w/replies, xml ) Need Help??


in reply to Re^4: Getting all subpaths from a path
in thread Getting all subpaths from a path

Thanks for the clarification. It's important to know because it tells us the restrictions you're working under, i.e. why the "just use rsync/tar/shar" suggestions weren't what you were looking for (though they could still be used...). I think that's led to some confusion in this thread so far. Anyway:

User gives me all the paths that he thinks are needed to run the tool inside the container (he gives a file that contains those paths and I read them into an array).

An important question here is: Would it be correct to assume you have access to the filesystem where these files are located? In other words, the Perl script, the input list of files, and the files themselves are all on the same machine? It's also still unclear to me if you want to mirror the files exactly as they are on the host machine, or if you want to manipulate the paths in any way?

Again, showing us with code is best, like choroba did in his Makefile. You also still haven't shown what format this list provided by the user looks like. Note that these things will also significantly benefit you in your development since they are at the same time test cases. In other words, the IMHO better way to ask the questions you asked is if you add them as test cases to the SSCCE that choroba provided.

I think your main concern seems to be this: if a user provides a path that includes a symlink, you want to make sure not to copy only that symlink, but also the target it points to, so that the symlink doesn't end up broken in the container. Another thing that is still unclear to me in this context is whether it is acceptable to you to rewrite any of the symlinks you encounter - for example, potential solutions could rewrite all relative symlinks to absolute ones, or symlink chains could be reduced by simply creating copies.

Anyway, what I've done here is construct an example that I think demonstrates what you're asking about.

mkdir -p /tmp/bar /tmp/foo touch /tmp/foo/one ln -fns /tmp/bar /tmp/foo/quz ln -fns ../foo/quz /tmp/bar/baz ln -fns ../foo/one /tmp/bar/two

In this example, the issue is that if the user were to specify only the path /tmp/bar/baz/two, then you need to figure out that all of /tmp/foo/{one,quz} and /tmp/bar/{two,baz} need to be reconstructed in order for the link to be valid. Here's my attempt at solving this; the tricky bit turned out to be figuring out the dependency chain for the symlinks. sub resolvesymlink is extracted from my script that I linked you to earlier (that includes tests so I'm fairly confident it's decent code, keeping in mind what I said). Note how this essentially does what I said above: round-trip the commands needed to recreate a directory structure.

Disclaimer: I've so far only tested it for the above test case plus a few variations. Use at your own risk. Though I do hope it's a starting point.

#!/usr/bin/env perl use warnings; use strict; use File::Basename 'fileparse'; use Cwd qw/getcwd abs_path/; use File::Spec::Functions qw/ splitdir catdir catfile file_name_is_absolute rel2abs rootdir /; use String::ShellQuote 'shell_quote'; use Graph; my @queue = @ARGV; # gather all dirs, files, and links my (%dirs,%files,%links); while ( my $targ = shift @queue ) { $targ = rel2abs($targ); die "does not exist: $targ" unless -e $targ; my @path = splitdir($targ); for my $i (1..$#path) { my $cur = catdir(@path[0..$i]); if ( -l $cur ) { defined( $links{$cur} = readlink($cur) ) or die "readlink $cur: $!"; # enqueue everything in the link chain # (excluding already seen symlinks) push @queue, grep { !$links{$_} } resolvesymlink($cur); } elsif ( -f $cur ) { $files{abs_path($cur)}++ } elsif ( -d $cur ) { $dirs{abs_path($cur)}++ } else { warn "skipping $cur, unknown type" } } } # simplify the dirs to shorten the mkdir command my $dg = Graph->new; for my $d (keys %dirs) { my @s = splitdir($d); $dg->add_edge(catdir(@s[0..$_]), catdir(@s[0..$_-1])) for 1..$#s; } # exterior vertices = leaves of the tree my @dirs = grep { $_ ne rootdir } sort $dg->exterior_vertices; print "mkdir -p ",shell_quote(@dirs),"\n" if @dirs; # output the files print "touch ",shell_quote(sort keys %files),"\n" if %files; # determine dependencies in symlinks via a topological sort my $lg = Graph->new; for my $l (keys %links) { my @res = resolvesymlink($l); die "unexpected resolvesymlink($l)" if @res<2; $lg->add_edge($l, $res[1]); # link depends on its target my @s = splitdir($l); for my $i (reverse 1..$#s-1) { my $d = catdir(@s[0..$i]); # if there's a link in the paths, this link depends on it too $lg->add_edge($l, $d) if defined $links{$d}; } } my @links = reverse grep { defined $links{$_} } $lg->topological_sort; print "ln -snf ",shell_quote($links{$_}, $_),"\n" for @links; # from https://bitbucket.org/haukex/htools/src/master/relink (a500e09) sub resolvesymlink { my $file = shift; die "not absolute: $file" unless file_name_is_absolute($file); my @files; my $origwd = getcwd; my $rv = eval { # in eval so orig working dir is always restored my $f = $file; while (1) { my $dir; ($f,$dir) = fileparse($f); last unless -d $dir; chdir $dir or die "chdir $dir: $!"; push @files, catfile(getcwd,$f); last unless -l $f; defined( $f = readlink $f ) or die "readlink $f (cwd=".getcwd."): $!"; } 1 }; my $err = $@||'unknown error'; chdir $origwd or die "chdir $origwd: $!"; die $err unless $rv; return @files ? @files : ($file); } __END__ mkdir -p /tmp/bar /tmp/foo touch /tmp/foo/one ln -snf /tmp/bar /tmp/foo/quz ln -snf ../foo/one /tmp/bar/two ln -snf ../foo/quz /tmp/bar/baz ln -snf ../foo/one /tmp/bar/baz/two

Update: Added the if @dirs and if %files to the two prints.

Replies are listed 'Best First'.
Re^6: Getting all subpaths from a path
by ovedpo15 (Pilgrim) on Apr 02, 2021 at 20:27 UTC
    Would it be correct to assume you have access to the filesystem where these files are located? In other words, the Perl script, the input list of files, and the files themselves are all on the same machine?

    Yes.

    It's also still unclear to me if you want to mirror the files exactly as they are on the host machine, or if you want to manipulate the paths in any way?

    Exactly the same.

    Anyway, your code is exactly what I need, especially the first part ("gather all dirs, files, and links"). It's a good start. I have only one question for now. For that I will use the example I presented in the first post:
    I have a path: /usr/vsa/pkgs/python3/3.6.3a/bin/python3.6. There are two links in the path:
    /usr/vsa -> /root/site/tools/gauv /usr/vsa/pkgs/python3/3.6.3a -> 3.6.3
    With your code I get three links:
    ln -snf /root/site/tools/gauv /usr/vsa ln -snf 3.6.3 /usr/vsa/pkgs/python3/3.6.3a ln -snf 3.6.3 /root/site/tools/gauv/pkgs/python3/3.6.3a
    For the second link I get an error:
    ln: failed to create symbolic link '/usr/vsa/pkgs/python3/3.6.3a': No +such file or directory which is correct
    Which makes sense since '/usr/vsa/pkgs/python3/3.6.3a' is not a realpath (even though I can access this path since it has a link on the way). Uncommenting this line works to build the recipe since this link is unnecessary because the third one is the correct one (it does what the second one wants to do). How can I "ignore"/"remove" those kind of links from %links?
      For the second link I get an error: ln: failed to create symbolic link '/usr/vsa/pkgs/python3/3.6.3a': No such file or directory

      I can't reproduce this. To create a test directory structure I tried:

      mkdir -p /tmp/site/tools/gauv/pkgs/python3/3.6.3/bin/python3.6 ln -fns /tmp/site/tools/gauv /tmp/vsa ln -fns 3.6.3 /tmp/site/tools/gauv/pkgs/python3/3.6.3a

      And running my script with /tmp/vsa/pkgs/python3/3.6.3a/bin/python3.6 as the input produces this output:

      mkdir -p /tmp/site/tools/gauv/pkgs/python3/3.6.3/bin/python3.6 ln -snf /tmp/site/tools/gauv /tmp/vsa ln -snf 3.6.3 /tmp/vsa/pkgs/python3/3.6.3a ln -snf 3.6.3 /tmp/site/tools/gauv/pkgs/python3/3.6.3a

      Which looks like the output you've shown (minus the mkdir command, which you are running first, right?). And if I delete /tmp/site and /tmp/vsa, the above set of commands work fine. I can only guess that perhaps there's a permissions issue (/root often has restrictive permissions set). I also noticed an inconsistency in the example you give in the root node: your example array of subpaths ends on /root/site/tools/gauv/python3/3.6.3/bin/python3.6, which is missing the pkgs from its path. I did not test my code in the presence of broken symlinks.

      Something doesn't add up. I've asked several times now for code, input, and output to reproduce the questions you have. Please read and understand all of the following, and provide actual, runnable code that reproduces the issue you're having, along with representative sample input, the expected output for that input, and the complete actual output, including any error messages.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11130730]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2024-03-29 07:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found