http://qs321.pair.com?node_id=542361
Category: Text processing
Author/Contact Info Lev Koszegi (lkoszegi<at>standard.com)
Description:

With much help from answers provided by fellow monks to my previous questions, I have come up with a script (my first real script, actually!) that will read a list of filenames and then apply a series of edits to the files. In the process, it will back the originals up in a tarball.

If any of you are willing to look over my code, I'd appreciate feedback. Any lines stand out as the wrong way to do things? Are there things that could be written more elegantly? Can error handling be improved? I'm quite new at Perl, so I wouldn't be surprised if there are things I could have done better.

So here's the script, which I've called fled for 'file list edit'.

The reason I'm not simply using perl -pi.bak -e 's|foo|bar|g' files is that these files are not necessarily all in the same directory, and besides, the in-place edit feature of Perl changes the owner of the edited file to whoever runs it. This script will maintain the same owners on the original files.

Since it runs from a list of files, that list can be generated by grep, or tcgrep, or however.

Anyway, here it is:


#!/usr/bin/perl -w
# Script to make text replacements to a list of files
#
# The author of this script makes no warranty,
# express or implied, that this script will
# do anything useful at all, and any use
# of it is entirely at your own risk.
#
# by Lev Koszegi 4/10/2006

use strict;

use File::Copy;
use File::Basename;
use Cwd;

usage() unless $#ARGV == 1; #Require two arguments (list file & patter
+n file)

my (@files, @patterns);
my ($pattern, $filename);
my $now = time;
my $dir = cwd;

open FILE_LIST, $ARGV[0]
  or die "Cannot open file list: $!";
chomp(@files = <FILE_LIST>);
#Test the lines of FILE_LIST to make sure files exist;
#if not, note which one is bad and die.
foreach $filename (@files) {
  die "Invalid file name ($filename): $!" unless -e $filename;
}
close FILE_LIST;

open PATFILE, $ARGV[1]
  or die "Cannot open pattern file: $!";
#Test the pattern file to make sure it looks okay, or die.
chomp(@patterns = <PATFILE>);
foreach $pattern (@patterns) {
  #Pattern must begin with a forward slash, contain exactly three
  #unescaped forward slashes, and end with same plus optional g and/or
+ i.
  unless ($pattern =~ /^\/.*[^\\]\/.*[^\\]\/(g|i|gi|ig)?$/) {
    die "<PATFILE>: bad pattern: $@";
  }
  #Create the substitution commands.
  $pattern = 's' . $pattern;
}
close PATFILE;

#Create backup directory, timestamped to make it unique.
my $bakdir = "bak_${now}";
mkdir $bakdir, 0755 or die "Cannot create archive directory: $!";

my $tarFile = $bakdir . '/' . "bak.tar";
my $logfile = $bakdir . '/' . "logfile";

#Create log file
open LOG, "> $logfile"
  or die "Cannot create logfile: $!";

select LOG;
$| = 1; # Don't keep LOG entries sitting in the buffer.
select STDOUT;

#Tarball the files to archive current state
!(system "tar", "chvlf", $tarFile, "-I", $ARGV[0]) 
  or die "Cannot create backup archive!";
print LOG "Created archive OK.\n";

!(system "gzip", $tarFile)
  or print LOG "Cannot compress archive; proceeding anyway.\n";

#Loop through each file and make the edit(s)
foreach $filename (@files) {
  #Copy contents of file to a temporary work file.
  my $wkfile = $bakdir . '/' . basename $filename;
  copy($filename, $wkfile) or die "Cannot copy ${filename}: $!";
  open(FILE2CHANGE, "<$wkfile") or die "Cannot open ${wkfile} for read
+ing: $!";
  open(UPDATED, ">$filename") or die "Cannot open ${filename} for writ
+ing: $!";
  while (<FILE2CHANGE>) {
    #Run each substitution on the line.
    foreach $pattern (@patterns) {
      eval $pattern;
    }
    print UPDATED;
  }
  print LOG "Processed ${filename}\n";
  close FILE2CHANGE;
  close UPDATED;
  unlink $wkfile;
}

close LOG;

###################################

sub usage {
die <<EOF
usage: fled [list file] [pattern file]

List file is a file containing the list of
    files to process, one file name per line.

Pattern file must contain the substitution expressions,
    one per line, using forward slashes as delimiters, but
    omitting the 's' at the beginning (e.g. "/foo/bar/g").
    Each line may optionally end with 'g' and/or 'i' switch(es).
EOF
}
Replies are listed 'Best First'.
Re: Edit a List of Files
by GrandFather (Saint) on Apr 10, 2006 at 20:54 UTC

    A mixture of, mostly minor, stylistic issues:

    use strict; use warnings;

    That mantra should come before you do anything else! Always! Regardless of projet size! No exceptions! Comprende?

    (condition) or print "fail string";

    is cute, but

    print "fail string" if condition;

    is clearer. In cases where the fail condition is non-zero it is even better to make it explicit:

    print "fail string" if 0 != (condition);

    It is tempting to comment things that you have just learned, but that can lead to over commenting and make it harder to grok the flow of the code.

    $var1 . 'string1' . "string2" is better written as "${var1}string1string2". Note the ${var1} usage to clarify where the variable name ends.

    The block for while (<FILE2CHANGE>) { is not indented.


    DWIM is Perl's answer to Gödel

      Thanks!

      I added use strict;, and fixed the indent problem. As for use warnings;, doesn't the "-w" in #!/usr/bin/perl -w do the same thing?

      I don't have time right now, but I'll review the script and consider your other suggestions as well. I do appreciate it!

        Yes, -w does do that. I tend not to notice it because on Windows I don't need (and therefore don't supply) the shebang line.


        DWIM is Perl's answer to Gödel
Re: Edit a List of Files
by chanio (Priest) on Apr 11, 2006 at 04:03 UTC
    Thank you!

    A very useful script.

    I would like to add something to my own version of this script (as soon as I can find the time to do it, sorry ;) ):

    • > Log files should speak clear and simple. Those are tools to work with even if the user ignores what the script is doing. After some time, I might forget what was the script doing. When preparing logs for large number of processed files it is easier to read one or two lines clearly identified with each file.
    • > Making a more expressive table of regexes that could load inside a hash with different extensions and another generic key that would apply to all files. That would be easy by just starting from a standard template with fields to fill in. (A kind of ini file where lines starting with # wouldn't count as values. And keys would start without any tab character)
    • > Modules could make this script useful for any platform!
    Then this script could become a very powerful tool!

      Thanks for your comments, Chanio. I'm considering how to make the log file a little more clear and informative.

      Your other suggestions sound good too, though they are more than I have time to figure out right now.

      And you should certainly feel free to take it and modify it for your own use! Especially if you share what you did and why, so I can learn yet more.