Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Unwanted splitting of File Name

by pimperator (Acolyte)
on May 23, 2014 at 06:04 UTC ( [id://1087161] : perlquestion . print w/replies, xml ) Need Help??

pimperator has asked for the wisdom of the Perl Monks concerning the following question:

I want to know the files within a directory tree. Something like this TOP -> SECOND -> FOLDERS 1-3 -> NAME -> Files I did not generate these files but they are formatted as such 1234 ACS. (Description).txt Yes that is white space in there My issue is when I'm reading .txt files from the "NAME" folder
opendir(DIR, $top.$second) or die "CANNOT OPEN SECOND DIRECTORY\n" +; @nameFolders = grep { !/^\.|\.\.$/ } readdir(DIR); closedir(DIR); foreach(@nameFolders){ $folder = $_; if($_ =~ /\.txt$/){ next; } #sometimes .txt files are here + but I took care of them earlier in the code and that works just fine #print $_."\n"; #Output is 100% perfect here opendir FIL, $top.$second."/".$folder or die "CANNOT OPEN +NAME DIRECTORY\n"; @files = grep { /\.txt$/ } readdir(FIL); closedir(FIL); foreach(@files){ $fileName = $_; print $fileName."\n"; #HERE IS MY PROBLEM OUTPUT IS BE +LOW @fileName = split / /, $fileName; $numID = $fileName[0]; $goodFiles{$fileName}=$numID; } }
OUTPUT: 1234 ACS. (STUFF).txt ACS. 1235 ACS. (STUFF).txt ACS. ...
What is going on here? I'm not spiting the file name until after I print, AND it's in the @files array. I'm at a loss. Thanks for any input.

Replies are listed 'Best First'.
Re: Unwanted splitting of File Name
by choroba (Cardinal) on May 23, 2014 at 07:14 UTC
    I created files as follows:
    ./a/1234 ACS. (STUFF).txt ./b/1234 ACS. (STUFF).txt ./c/1234 ACS. (STUFF).txt

    Then I ran your script with empty $top and $second = '.'. The output was:

    1234 ACS. (STUFF).txt 1234 ACS. (STUFF).txt 1234 ACS. (STUFF).txt

    I can't reproduce your problem.

    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Unwanted splitting of File Name
by Anonymous Monk on May 23, 2014 at 07:13 UTC

    What is going on here ?

    Maybe the filename has newlines in it?

    To be sure instead of merely printing use ddumperingBasic debugging checklist to visualize your data (lesson courtesy of Basic debugging checklist and brian's Guide to Solving Any Perl Problem )

    Also, you're not taking free help :) (strict/warnings...Read this if you want to cut your development time in half!

    You're using readdir which is like making your own hats pimperator, use find/rule

    #!/usr/bin/perl -- use strict; use warnings; use Data::Dump qw/ dd /; use Path::Tiny qw/ path cwd /; use File::Find::Rule qw/ find rule /; my @names = find( directory => maxdepth => 1, in => $top.$second ); my @files = find( file => name => qr/\.txt$/i, in => \@names ); for my $fp ( @files ){ my $name = path( $fp )->basename; dd( $fp, $name ); }

    update: a test :) simpler find/rule usage, mindepth means start testing rules at this depth

    #!/usr/bin/perl -- use strict; use warnings; use Data::Dump qw/ dd /; use File::Find::Rule qw/ find rule /; my $startdir = 'file-find-rule-mindepth-maxdepth'; my @files = find( file => name => qr/\.txt$/i, mindepth => 2, maxdepth => 2, in => $startdir, ); dd( \@files ); __END__ $ findrule file-find-rule-mindepth-maxdepth file-find-rule-mindepth-maxdepth file-find-rule-mindepth-maxdepth/6.txt file-find-rule-mindepth-maxdepth/a file-find-rule-mindepth-maxdepth/q file-find-rule-mindepth-maxdepth/q/6.txt file-find-rule-mindepth-maxdepth/q/r file-find-rule-mindepth-maxdepth/q/r/5.txt file-find-rule-mindepth-maxdepth/q/s file-find-rule-mindepth-maxdepth/q/s/7.txt file-find-rule-mindepth-maxdepth/x $ perl ["file-find-rule-mindepth-maxdepth/q/6.txt"]
      To reiterate what I feel is the important point of the parent: Use File::Find. It's core. It will prevent self-hosing, which will happen at some point when recursing directories. :)
Re: Unwanted splitting of File Name
by AnomalousMonk (Archbishop) on May 23, 2014 at 14:22 UTC
    @nameFolders = grep { !/^\.|\.\.$/ }  readdir(DIR);

    (This reply is parenthetic to your main problem, well addressed by others. (No, not pathetic, parenthetic!)) Because of the high precedence of the | (ordered alternation) operator, the regex /^\.|\.\.$/ matches any string that either begins with a . (period) or that ends with two periods. This causes the grep to eliminate the . and .. directories, but potentially much else besides.

    c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my @dirs = qw(. .. ... .abc xyz.. xyz. abc ab.c a..bc) +; @dirs = grep { !/^\.|\.\.$/ } @dirs; dd \@dirs; " ["xyz.", "abc", "ab.c", "a..bc"]

    Better:  /^\.\.?$/