http://qs321.pair.com?node_id=1072313

steve has asked for the wisdom of the Perl Monks concerning the following question:

I have a directory in a filesystem (specifically linux/ext3 but the intent of the question is to be more generic) that potentially has a number of subdirectories and files at each level of the tree.

I would like to essentially flatten this kind of structure and place all of the files in the topmost directory in the tree and remove all remaining (now empty) subdirectories.

Aside: At some point I also want to add in the functionality to handle potential namespace collisions, but that seems much more simple.

As is almost always the case TIMTOWTDI, but I am seeking some wisdom in better practices from anyone who has done similar operations before. It seems that system, exec, etc. are wasteful as far as resources are concerned. Potentially I could write some recursive readdir-based function and use that to move any nodes of type file to the parent. Additionally there is a case to be made for using something like File::Find to make this happen as well.

Please share what you know about a great way to address this problem.

  • Comment on Flattening a directory tree in a filesystem

Replies are listed 'Best First'.
Re: Flattening a directory tree in a filesystem
by atcroft (Abbot) on Jan 28, 2014 at 05:04 UTC

    As you mentioned, File::Find to search for files, File::Spec to handle dealing with paths (or possibly File::Basename to separate the path and file name), and File::Copy to move the files. (I thought about suggesting Parallel::ForkManager and spawning off multiple children to do the copying, but realized you would likely not see a performance improvement as a result.)

    Regarding collisions, I would suggest scanning for files in a pass, then do all of the file moves in a second loop.

    Hope that helps.

Re: Flattening a directory tree in a filesystem
by NetWallah (Canon) on Jan 28, 2014 at 06:17 UTC
    Depending on the volatility of your file system (files being added/deleted/moved) - you may need to prevent changes while you are scanning and moving files.

    I'd suggest taking ownership of everything , and/or making it all read-only prior to reading -possibly even dismounting the file system (after killing processes that have files open on it), and re-mounting it such that only you have modify access).

            If your eyes hurt after you drink coffee, you have to take the spoon out of the cup.
                  -Norm Crosby

Re: Flattening a directory tree in a filesystem
by karlgoethebier (Abbot) on Jan 28, 2014 at 11:58 UTC
    "...functionality to handle potential namespace collisions, but that seems much more simple."

    Shure, may be it's simple. I wonder how i would do that. Perhaps adding a GUID to each file with the same name?

    foo.pl.e891346b-a18e-43eb-bd6c-fa6267c568e3 foo.pl.6b97a88c-e742-43c3-9f71-a004685be89e foo.pl.1b7b0031-981f-4ac0-a0c5-5582735528c8 foo.pl.d82d0bcc-26b5-4b8d-9fa1-950f2b27d2b3 foo.pl.bb388fbf-f8a6-4511-93ce-f6d2b737cf00 ...

    Perhaps there are many thousands more with the same name but different content, who knows...

    And know? Not very convinient and practical if one needs to do further processing on this files. Looks a bit like LOST+FOUND ;-)

    Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

Re: Flattening a directory tree in a filesystem
by jellisii2 (Hermit) on Jan 28, 2014 at 17:26 UTC
    Definitely File::Find will be a good friend to make here. The &wanted sub can deal with collisions on the fly. The following is untested.
    use warnings; use strict; use File::Find; use File::Spec::Functions; use Time::HiRes; use File::Copy; my $root = "/path/to/root" find(\&wanted, $root); sub wanted { if (File::Find::dir eq $root) { print "$_ is already in the root folder $root\n"; } else { my $target = catfile($root, $_); if (-f $target) { my $microtime = join '.', Time::HiRes::gettimeofday(); print "Filename collision, appending $microtime to $_ before mo +ving\n"; move($File::Find::name, $target . $microtime); } else { print "Moving " . $File::Find::name . " to $target\n"; move($File::Find::name, $target) } } }
    There are probably saner ways to deal with collisions; Being able to know where it came from might be very nice.