http://qs321.pair.com?node_id=809251


in reply to Building a UNIX path from Irritating data

It might help to give us some runable code. I started by writing this, but the folder IDs don't match up.

my %folders = ( 4464 => 'foldername_1', 4465 => 'foldername_2', 4466 => 'foldername_3', ); my @subfolders = split /\n/, <<'SUBFOLDER_LIST'; Folder 1298 - foldername_ten. subfolder 1299. subfolder 1300. Folder 1299 - foldername_eleven. No sub folders. Folder 1300 - foldername_twelve. No sub folders. Folder 1311 - foldername_thirteen. subfolder 1317. subfolder 1318. subfolder 1958. SUBFOLDER_LIST ;

Off the topic of your question, one thing I notice about your code is that build_path is defined with a prototype (which is usually a bad idea), and then you call it with an ampersand sigil which bypasses the prototype anyway. Unless there's more code somewhere that requires the prototype to be in effect, I'd suggest you get rid of it and call build_path normally (without the ampersand).

I'm not entirely clear on the question you're trying to answer. It seems as if you're saying that build_path could return more than one path for a given folder ID. If that's the case, I'd say have it return a reference to an array rather than a simple string. Later, when you walk through %folderpaths, you can create the first path listed and make the others symbolic links to it. The code below is supposed to give you an idea of what I'm talking about.

use Data::Dumper; sub mock_build_path { my $folderid = shift @_; my @dumpff = @_; my @paths; push @paths, 'example/1'; push @paths, 'example/2'; push @paths, 'example/3'; return \@paths; } my %folderpaths; $folderpaths{ '1337' } = mock_build_path( 1234, 'stuff' ); print Dumper \%folderpaths; my ( $mkpath, @linkpaths ) = @{ $folderpaths{ '1337' } }; print "make path: $mkpath\n"; print "link paths: ", join( q{, }, @linkpaths ), "\n"; __END__ $VAR1 = { '1337' => [ 'example/1', 'example/2', 'example/3' ] }; make path: example/1 link paths: example/2, example/3

This requires the use of references, which I don't see in the code you've posted. If you're not familiar with references, have a look at Perl reference tutorial, Perl References, and References quick reference (links I took from the home node of kennethk, which has a lot of other good information on it).

I hope this helps.

Replies are listed 'Best First'.
Re^2: Building a UNIX path from Irritating data
by roswell1329 (Acolyte) on Nov 25, 2009 at 22:50 UTC
    My mistake, kyle. My original code pulls the data directly from the command-line utilities provided with the software and formats it on-the-fly, and in my original post, I just created some sample data from memory. I've created some static versions of the data produced by those command-line utilities (see below code), and here's a modified version of code that uses those files:
    #!/usr/bin/perl -w use strict; my $foldername_file = shift @ARGV; my $folderfolder_file = shift @ARGV; my %folders = (); my %folderpaths = (); open(FNAMES,"$foldername_file") or die "Can't open $foldername_file: $ +!\n"; foreach my $folderline (<FNAMES>) { chomp $folderline; if ($folderline =~ /^Folder (\d{3,5})\s+\-\s+(.*)\.$/) { $folders{$1} = $2; } } close(FNAMES); open(FFINFO,"$folderfolder_file") or die "Can't open $folderfolder_fil +e: $!\n"; my @subfolders = <FFINFO>; foreach my $k (sort (keys (%folders))) { $folderpaths{$k} = build_path($k,@subfolders); print "$k => $folderpaths{$k}\n"; } exit 0; sub build_path { my $folderid = shift @_; my @dumpff = @_; my $path = "$folderid"; my $parentid = ""; foreach my $line (@dumpff) { if ($line =~ /^Folder (\d{3,5})\s+\-\s+.*\./) { $parentid = $1; } elsif ($line =~ /\s+subfolder\s+$folderid\s+\-\s+.*\./ +) { $path = join('/', build_path($parentid,@subfol +ders),$folderid); } } return $path; }
    And you can use the following two data files to make it work:

    foldernames.txt
    subfolders.txt

    Basically, the code above works fine for most all of the data, but there are about 60-70 folder ID's that appear as subfolders to more than one parent folder ID. Folder ID 3053 is a good example. It is a subfolder under 3051, 3057, 3063, and 3067. If you run my code, however, the path printed for 3053 is:

    3053 => 100/3051/3053

    To be totally complete, I would need to also print:

    100/3057/3053
    100/3063/3053
    100/3067/3053

    The only role build_dirs has is to find the parent folder ID of the folder ID it is given, and if the parent ID it finds isn't a root-level node, it calls itself again with the parent ID until a full path is created. However, since I'm building the path layer by layer, I can't figure out when to check for a duplicate path. Each iteration of build_dirs doesn't have any knowledge of any complete paths that have been found. I suppose I could return all the results from a search, but I'm not sure how that would look. Are you suggesting something like this?

    3053 => 100/3051:3057:3063:3067/3053

    Thank you for your assistance!

      I replaced everything after the second open with this:

      my %parents_of; my $parentid; while ( my $line = <FFINFO> ) { if ($line =~ /^Folder (\d{3,5})\s+\-\s+.*\./) { $parentid = $1; } elsif ($line =~ /\s+subfolder\s+(\S+)\s+\-\s+.*\./) { push @{ $parents_of{ $1 } }, $parentid; } } sub build_path { my $folderid = shift @_; my @parents = @{ $parents_of{ $folderid } || [] }; return @parents ? [ map { map { "$_/$folderid" } @{ build_path( +$_ ) } } @parents ] : [ $folderid ]; } foreach my $k (sort (keys (%folders))) { $folderpaths{$k} = build_path($k); print "$k => $_\n" for @{ $folderpaths{$k} }; }

      The highlights of the changes are:

      • I read the subfolders file only once.
      • I store the subfolders data in a %parents_of hash of arrays, which maps each folder ID to the folders that have it as a subfolder.
      • In build_path, I use the hash of arrays instead of iterating over the whole subfolders file. It's still recursive.
      • Since build_path returns an array reference now, the output loop has to iterate its contents.

      For the ID that you pointed out, the new output is:

      3053 => 100/3051/3053 3053 => 100/3057/3053 3053 => 100/3063/3053 3053 => 100/3066/3053

      The new code produces all the output of the old code plus another 1108 lines, and it runs in less than 1 second while the original takes about 220 seconds.

      I hope this helps.

        Hi kyle --

        I've been working on your code for a while, and I cannot yet see how it builds a path, and I cannot get it to work for me. I copied your code above exactly as you said and used the data file that I provided, but here's the output I get:

        100 => 100 101 => 101 1013 => 1013 1014 => 1014 1015 => 1015 1053 => 1053 1057 => 1057 1059 => 1059 1065 => 1065 119 => 119 1198 => 1198 1227 => 1227 1228 => 1228 1229 => 1229 1230 => 1230 1231 => 1231 1232 => 1232 1233 => 1233 1234 => 1234 1238 => 1238 1239 => 1239 1241 => 1241 1298 => 1298 1299 => 1299 1300 => 1300 1311 => 1311 1317 => 1317 1318 => 1318 1320 => 1320 1321 => 1321 1322 => 1322 1323 => 1323 1324 => 1324 1325 => 1325 1327 => 1327 1328 => 1328 1329 => 1329 1347 => 1347 1348 => 1348 1349 => 1349 1350 => 1350 1351 => 1351 1352 => 1352 1374 => 1374 1375 => 1375 1376 => 1376 1377 => 1377 1390 => 1390 1393 => 1393 1394 => 1394 1395 => 1395 1396 => 1396 1397 => 1397 1398 => 1398 ...
        I think the line that may be wrong in my code is this one:
        return @parents ? [ map { map { "$_/$folderid" } @{ build_path( $_ ) } } @parents ] : [ $folderid ];
        I've never seen a "return" command used in a conditional this way. It would seem to me that in order for the conditional to be true, the focus would have already returned to the line calling build_path originally. Does this actually mean if "return @parents" would succeed then do X, if not, Y?

        I really appreciate your help. I think this is very close to what I need, but I'm missing some keystone. To make sure I got your code correct, here's the exact script I'm running (in cygwin on a windows system, and on an HPUX system):

        #!/usr/bin/perl -w use strict; my $foldername_file = shift @ARGV; my $folderfolder_file = shift @ARGV; my %folders = (); my %folderpaths = (); open(FNAMES,"$foldername_file") or die "Can't open $foldername_file: $ +!\n"; foreach my $folderline (<FNAMES>) { chomp $folderline; if ($folderline =~ /^Folder (\d{3,5})\s+\-\s+(.*)\.$/) { $folders{$1} = $2; } } close(FNAMES); open(FFINFO,"$folderfolder_file") or die "Can't open $folderfolder_fil +e: $!\n"; my @subfolders = <FFINFO>; my %parents_of; my $parentid; while ( my $line = <FFINFO> ) { if ($line =~ /^Folder (\d{3,5})\s+\-\s+.*\./) { $parentid = $1; } elsif ($line =~ /\s+subfolder\s+(\S+)\s+\-\s+.*\./) { push @{ $parents_of{ $1 } }, $parentid; } } sub build_path { my $folderid = shift @_; my @parents = @{ $parents_of{ $folderid } || [] }; return @parents ? [ map { map { "$_/$folderid" } @{ build_path( +$_ ) } } @parents ] : [ $folderid ]; } foreach my $k (sort (keys (%folders))) { $folderpaths{$k} = build_path($k); print "$k => $_\n" for @{ $folderpaths{$k} }; }