Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Recursive File Substitution

by mt2k (Hermit)
on Jul 12, 2002 at 07:17 UTC ( #181223=snippet: print w/replies, xml ) Need Help??
Description: In this example, carriage returns (\r) are removed from all files from a top-level directory down.
File extension exceptions may be entered to exclude certain file types (such as images).

Could be used for several reasons!

#!/usr/bin/perl -w

# Update: Script should now work without generating
# any warnings concering the %skip hash. Thanks to
# jmcnamara for helping with that :)

# This script will remove carriage returns (\r) from
# all files located below a specified parent directory.
# Could be expanded to allow any modifications.

use strict;
use File::Find;
use File::Basename;

####################################
# Configuration

# Directory where recursive modification will begin
my $dir = "/home/username/directory";

# File extensions to skip (ie: images)
# Each entry has the following look:
#      'file extension' => 1
my %skip = (
   'gif'  => 1,
   'jpg'  => 1,
   'jpeg' => 1,
   'png'  => 1
);


####################################
# Main Program - No Need To Edit :)

my $count;  # Just a counting thing
my @dirs;   # All recursed directories

# This does at least 85% of the work!
# It gets a recursive list of all directories
find(\&{
  sub {
    if (-d "$File::Find::dir/$_") {
      if ($count++ == 0) {
        push @dirs, $File::Find::dir;
      } else {
        push @dirs, $File::Find::dir . "/$_";
      }
    }
  }
}, $dir);


# Loop through all directories
foreach my $curDir (@dirs) {

  # Get list of files from the directory
  opendir DIR, $curDir;
    my @files = readdir DIR;
  closedir DIR;

  # Loop through the files
  foreach my $file (@files) {

    # Skip directories and "." and ".." entries
    # as well as excluded file extensions
    my ($nil,$ext) = $file =~ /^(.*?)\.(.*?)$/gs;
    $ext = '' unless defined $ext;
    next if (-d "$curDir/$file" || $file =~ /^\./ || $skip{$ext});

    # Read in the file
    open FILE, "$curDir/$file";
      my @lines = <FILE>;
    close FILE;

    # Modify and print the file back
    open FILE, ">$curDir/$file";
      print FILE map { s/\r//g; $_; } @lines;
    close FILE;

  }

}
Replies are listed 'Best First'.
Re: Recursive File Substitution
by Aristotle (Chancellor) on Jul 14, 2002 at 21:24 UTC
    Without going into the golfish ways of doing this, but sticking to script form, there's a whole lot of things you can improve.
    my %skip = ( 'gif' => 1, 'jpg' => 1, 'jpeg' => 1, 'png' => 1 );
    I would prefer to write this like so:
    my %skip_for; @skip_for{qw( gif jpg jpeg png )} = ();

    and later test using exists $skip_for{$ext}

    Next note: you can just use $File::Find::name rather than "$File::Find::dir/$_"

    Then we have a case of redundant syntax: in \&{ sub { ... } } the sub{ ... } already gives you a reference. Then your &{} goes and dereferences it, only to feed it back to the \ which makes a reference from the result again. You can drop the surrounding \&{} and simply write sub { ... } here.

    I am a bit puzzled by this:

      my ($nil,$ext) = $file =~ /^(.*?)\.(.*?)$/gs;

    If you throw away the first capture, why capture at all?

      my ($ext) = $file =~ /^.*?\.(.*?)$/gs;

    which is better written as

      my ($ext) = $file =~ /[.]([^.]+)$/gs;

    (In words: I want as many non-dot characters as there are in front of the end of the string, update: but only if there's a dot in the filename.)

    The $ext = '' unless defined $ext; can be avoided if you put the $skip{$ext} inside an if(/match here/)

    Lastly, since you're not interested in the individual lines of your input, but separating the input costs effort, it would be better to unconditionally slurp large chunks of X bytes instead.

    The next point is a maneuvre critique. Why would one first fetch a list of directories and then go and read each directory manually, when the same first search already gives you all the file names on a silver plate? (And why are counting something, when you never use that count? :-))

    And lastly, rather than hardcode the directory in the script, it's preferrable to take them as parameters from the commandline.

    So here's an updated version:

    #!/usr/bin/perl -w use strict; use Fcntl; use File::Find; my %skip_for; @skip_for{qw( gif jpg jpeg png )} = (); find( sub { next if -d or /^[.]/; next if /[.]([^.]+)$/ and exists $skip_for{$1}; my $content = ""; # gobble and mangle 64k chunks at a time sysopen FH, $_, O_RDWR; s/\r//g, $content .= $_ while sysread FH, $_, 65536; # go back to top of file sysseek FH, 0, 0; syswrite FH, $content, length $content; # the file still has its original length, # because we didn't clobber it with an open FH, ">file" # so we need to fix that truncate FH, tell FH; close FH; }, (@ARGV) || "." # NB: parens required );

    Further improvement might be to use some Getopt:: module to allow the user to change the $skip_for rules.

    Update: I must have been asleep as well. Kudos to Zaxo for pointing out my regex would return the whole filename for extensionless files. Also, I need to go flaggelate myself for a while:

    sysopen FH, $_, O_RDWR or (warn "Couldn't open $File::Find::name: $!\n", return); s/\r//g, $content .= $_ while ( defined (sysread FH, $_, 65536) or (warn "Couldn't open $File::Find::name: $!\n", return) );
    and, of course,
    return if -d or /^[.]/; return if /[.]([^.]+)$/ and exists $skip_for{$1};
    since this is a sub, not a for loop. I feel stupid now. Oh well, guess we can feel stupid together. :-)

    Makeshifts last the longest.

      Heheh, apparently I was either sleeping, on some kind of drugs or in just some kind of hurry, throwing that script together with anything that did the job :)

      Thanks for the improvement (okay fine, improvementS) that you made to it. Must look at perldoc -f sysopen and related docs. Once again, thanks! :)

      *goes off, pretending that his code is perfect and cannot be improved upon*

Re: Recursive File Substitution
by Anonymous Monk on Jul 12, 2002 at 08:31 UTC
    $ perl -MFile::Find -e'find(sub{next if -d or /^\.\.?/ or /\.(jpe?g|gi +f|png)$/;print "$File::Find::name "},".")' | xargs perl -pi.bak -e 's +/\r//g'
    8^)
      find . -type f -print | xargs perl -pi -e 's/\r//g'
        ..slightly different:
        perl -pi.orig -e 's/\cM//g' `find . -type f`
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: snippet [id://181223]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2022-10-07 10:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My preferred way to holiday/vacation is:











    Results (29 votes). Check out past polls.

    Notices?