Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^4: Directory Tree Structure

by Lady_Aleena (Priest)
on Oct 05, 2009 at 02:12 UTC ( [id://799151]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Directory Tree Structure
in thread Directory Tree Structure

graff,

Here is what I have so far. Unfortunately, I haven't worked out a few things like having it only read the files in the directories and excluding the subdirectories in directory_contents. I am getting duplicates.

I have been shown a few other things, but haven't quite gotten the kinks worked out of those either.

#!/usr/bin/perl use strict; use warnings; use diagnostics; use File::Find; use File::Spec; my $rootdir = "C:/Documents and Settings/<my name>/My Documents/fantas +y"; my @files; sub wanted { my $text = $File::Find::name; push @files, $text; return; } sub directory_contents { my ($contents) = @_; my $file_tab = $contents; $file_tab =~ s{$rootdir}{}; my $tab = "\t" x ($file_tab =~ tr{/}{}); my @list = (map("$tab<li>".$_."</li>\n",grep{m/^($contents)\//} @fil +es)); #the grep is where I think I need to put the stop on going lowe +r. return join("",@list); } sub print_directory { my ($file) = @_; my $file_tab = $file; $file_tab =~ s{$rootdir}{}; my $tab = "\t" x ($file_tab =~ tr{/}{}); my $directory = qq{$tab<li>$file\n$tab<ul>\n}.directory_contents($fi +le).qq{$tab</ul>\n$tab</li>\n}; print $directory; } find(\&wanted, $rootdir); foreach my $entry (File::Spec->no_upwards(@files)) { print_directory($entry) if -d $entry; }
Have a nice day!
Lady Aleena

Replies are listed 'Best First'.
Re^5: Directory Tree Structure
by graff (Chancellor) on Oct 06, 2009 at 09:12 UTC
    Unfortunately, I haven't worked out a few things like having it only read the files in the directories and excluding the subdirectories in directory_contents. I am getting duplicates.

    Well, you're doing something a little different from the OP in this thread, listing all the data files along with the directories (instead of just the directories), so that does complicate things. To see where you're getting duplication and trouble, I think you'll want to use Data::Dumper 'Dumper'; and step through it with "perl -d". In the debugger, you can do p Dumper($some_variable_or_reference) to see what's happening with the data.

    I think the correct way to make sure that you get the right structure in your output is to have the right structure be created by the "wanted" sub that you pass to File::Find. And that would be HoH, built recursively to have nested hash keys represent the nested directories. A special hash key "./" can be used to hold a reference to an array that contains the actual data files (if any) in a given directory.

    Then for output, you just walk through the structure, indenting and including suitable html tags as you go. (And if you're really going to be outputting html, you should probably use HTML::Entities URI::Escape on the path strings.)

    I cobbled this together based on another old post of mine that had to do with building a recursive hash structure (Re: seeking in hash data structure), and I'll confess that this took me longer than I'd like to admit. (It's tricky business to do recursion right, esp. if you don't do it very often.)

    I'm sure there's one or more modules that could be used to simplify (or replace) this code. You might need to tweak it -- change "/" to "\", etc -- to make it work properly on windows (it works for me on macosx/unix/linux):

    #!/usr/bin/perl use strict; use warnings; use diagnostics; use File::Find; use File::Basename; use HTML::Entities; @ARGV == 1 and -d $ARGV[0] or die "Usage: $0 path_name\n"; ( my $rootdir = shift ) =~ s{(?<!/)$}{/}; # make sure path ends with + "/" my %tree; $tree{$rootdir} = {}; find(\&wanted, $rootdir); print "<html>\n"; print_tree( \%tree, 0 ); print "</html>\n"; sub wanted { local $_ = $File::Find::name; if ( -f ) { # only work on data files (skip directories +) s{\Q$rootdir\E}{}; # remove the rootdir string from the pat +h name load_tree( $tree{$rootdir}, fileparse( $_ )); } } # recursively load the hash structure # (first call gets top-level hashref, and # file name, path from File::Basename::fileparse) sub load_tree { my ( $href, $name, $path ) = @_; my @dirs = split /\//, $path; push @dirs, '.' if ( $dirs[$#dirs] ne '.' ); my $key = shift @dirs; while ( @dirs and $key ne '.' and exists( $$href{"$key/"} )) { $href = $$href{"$key/"}; $key = shift @dirs; } if ( $key ne '.' and ! exists( $$href{"$key/"} )) { $$href{"$key/"} = {}; load_tree( $$href{"$key/"}, $name, join( '/', @dirs, '' )); } elsif ( $key eq '.' ) { push @{$$href{"$key/"}}, $name; } } # recursively print embedded lists sub print_tree { my ( $href, $indent ) = @_; printf( "%s<ul>\n", ' ' x $indent ); $indent++; if ( exists( $$href{'./'} )) { printf( "%s<li>%s</li>\n", ' ' x $indent, encode_entities( $_ + )) for ( @{$$href{'./'}} ); delete $$href{'./'}; } if ( keys %$href ) { for my $subdir ( sort keys %$href ) { printf( "%s<li>%s\n", ' ' x $indent, encode_entities( $su +bdir )); $indent++; print_tree( $$href{$subdir}, $indent ); $indent--; printf( "%s</li>\n", ' ' x $indent ); } } $indent--; printf( "%s</ul>\n", ' ' x $indent ); }
    (Updated to use HTML::Entities instead of URI::Escape -- the latter was a wrong choice)
      graff,

      What is the line @ARGV == 1 and -d $ARGV[0] or die "Usage: $0 path_name\n"; doing? I plug in my $rootdir but the script dies at that point. I can't figure out what is wrong until I know what that is doing. Only after I get this working can I start adding all of the sort rules that will be needed (if I can figure out where those should go).

      Have a nice day!
      Lady Aleena
        Sorry... I changed the "user interface" on the script without telling you. The version you originally posted had a specific path hard-coded for "$rootdir", but in order for me to try it out myself (without a windows system or the specific directory path you used), I made it so that the "$rootdir" comes from a command-line argument (given to the script via @ARGV).

        So my version requires exactly one command-line arg (@ARGV==1) and this arg must represent a reachable directory path (-d $ARGV[0]), or else the script dies and tells you what it needs in order to run (Usage: {name_of_script} path_name). That's just a typical unix-style CLI.

        To go back to your own hard-coded value for $rootdir, just comment out that @ARGV line and the one after it, then add in your single line that assigns a specific path string to $rootdir.

      To do list for the following code, in any order. I have tried doing these, but my efforts have lead to bigger messes.

      • Figure out what in push @dirs, '.' if ( $dirs[$#dirs] ne '.' ); is causing Use of uninitialized value in string ne at file_crawl.pl line 55 (#1) warning. Here is a screenshot of where in the results is the problem, maybe.
      • Skip certain directories, file types, and some file names.
      • Create links relative to $rootdir. I need my paths back.
      • Transform the link text into titles (subroutine included).
      • For the first iteration of sub print_tree, put onclick="list_onclick(event)" in the <ul> element.
      • For the link to the home page, the link will be $rootlink, but the link text will be home.
      • In each <li> which is for a directory, put class="closed" in the <li> element with the exception for the active file's directory, then it would be class="open".
      • For the active file, put class="active" in the <a> element.
      • Put in sort rules such as sorting ignore articles a, an, and the; by the last word instead of the first for one directory; for another create the list manually more than likely; and for another put favorites last (or first, I haven't decided yet).
      #!/usr/bin/perl use strict; use warnings; use diagnostics; use Cwd; use File::Find; use File::Basename; my $filename = basename($0); my $dirname = getcwd; my %directories = ( '/ftp/pub/www/fantasy' => { link => 'http://www.xecu.net/fantasy', user => 'Fantasy', name => "Fantasy's Realm", }, '/home/lady_aleena/var/www' => { link => 'http://lady_aleena.perlmonk.org', user => 'Lady Aleena', name => "Lady Aleena's Home", }, 'C:/Documents and Settings/<my name>/My Documents/fantasy' => { link => 'file:///C:/Documents and Settings/<my name>/My Documents/ +fantasy', user => '<my name>', name => "<my name>'s Place", }, ); sub get_rootdir { for my $dir (keys %directories) { return $dir if $dirname =~ /^\Q$dir/; } } my $rootdir = get_rootdir; if (not exists $directories{$rootdir}) { die "You really screwed up." +} my $rootlink = $directories{$rootdir}{link}; my $rootuser = $directories{$rootdir}{user}; my $rootname = $directories{$rootdir}{name}; #@ARGV == 1 and -d $ARGV[0] or die "Usage: $0 path_name\n"; #( $rootdir = shift ) =~ s{(?<!/)$}{/}; # make sure path ends with " +/" my %tree; $tree{$rootdir} = {}; find(\&wanted, $rootdir); print_tree( \%tree, 0 ); sub wanted { local $_ = $File::Find::name; if ( -f ) { # only work on data files (skip directories) s{\Q$rootdir\E}{}; # remove the rootdir string from the path name load_tree( $tree{$rootdir}, fileparse( $_ )); } } # recursively load the hash structure # (first call gets top-level hashref, and file name, path from File::B +asename::fileparse) sub load_tree { my ( $href, $name, $path ) = @_; my @dirs = split /\//, $path; push @dirs, '.' if ( $dirs[$#dirs] ne '.' ); my $key = shift @dirs; while ( @dirs and $key ne '.' and exists( $$href{"$key/"} )) { $href = $$href{"$key/"}; $key = shift @dirs; } if ( $key ne '.' and ! exists( $$href{"$key/"} )) { $$href{"$key/"} = {}; load_tree( $$href{"$key/"}, $name, join( '/', @dirs, '' )); } elsif ( $key eq '.' ) { push @{$$href{"$key/"}}, $name; } } # recursively print embedded lists sub print_tree { my ( $href, $indent ) = @_; printf( "%s<ul>\n", ' ' x $indent ); $indent++; if ( exists( $$href{'./'} )) { printf( "%s<li>%s</li>\n", ' ' x $indent, $_ ) for ( @{$$href{'./'}} ); delete $$href{'./'}; } if ( keys %$href ) { for my $subdir ( sort keys %$href ) { printf( "%s<li>%s\n", ' ' x $indent, $subdir); $indent++; print_tree( $$href{$subdir}, $indent ); $indent--; printf( "%s</li>\n", ' ' x $indent ); } } $indent--; printf( "%s</ul>\n", ' ' x $indent ); }

      sub transform

      sub transform { my ($text) = @_; $text =~ tr/_/ /; $text =~ s/.*\/+//; $text =~ s/\.[^.]*\z//; return $text; }
      Have a nice day!
      Lady Aleena
        Hi -- Sorry about the slowness to respond to this. I think you're really close here, but I'm not sure how to explain the issue.

        First off, I gather that a bunch of the stuff at the top of this latest version of yours is not relevant to the problem: your "%directories" hash, the "get_rootdir" sub and the $root(link|user|name variables are all no-ops. You are just using File::Find on whatever the current working directory happens to be.

        So, putting all that extra stuff aside, the only change I think you need in the operative code is this:

        sub wanted { local $_ = $File::Find::name; if ( -f ) { # only work on data files (skip directories) s{\Q$rootdir\E[\\/]}{}; # remove the rootdir string from the path +name # ... ALONG WITH THE SUBSEQUENT "\" OR "/" + CHARACTER load_tree( $tree{$rootdir}, fileparse( $_ )); } }
        I think the point here is: once you get into the first subdirectory, removing just the "$rootdir" string leaves $_ with the path separator character at the start. That somehow causes fileparse() to not play nice with the recursive "load_tree" function. Anyway, try that out and see if it helps.
      graff,

      I am now getting a warning that something in the following line is uninitialized in load_tree.

      push @dirs, '.' if ( $dirs[$#dirs] ne '.' );

      The warning

      Use of uninitialized value in string ne at file_crawl.pl line 56 (#1)

      I am not sure which of those is the problem. Also, would you be willing to help me write this to a file temporarily? Windows command prompt is not the easiest to read when the output is long.


      Addendum

      When I commented out s{\Q$rootdir\E}{}; in wanted, the warning went away. It was an unintended side effect of me trying to figure out how to print the full path on each line.

      Have a nice day!
      Lady Aleena

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://799151]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2024-03-28 16:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found