Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

mvre - MoVe files based on given Regular Expressions

by parv (Parson)
on Nov 21, 2003 at 03:44 UTC ( [id://308797]=sourcecode: print w/replies, xml ) Need Help??
Category: Utility Scripts
Author/Contact Info parv_@yahoo.com
Description:

UPDATE, Nov 24 2003: Completely updated the code as of version 0.41, fixes check for existing file w/ few new options/features.

This modules requires Parv::Util, that i use quite often. Most current version of "mvre" is located at http://www103.pair.com/parv/comp/src/perl/mvre.

Back to mvre; pod2html has this to say...

In default mode, mvre moves file located in the current directory, matching (Perl) regex, to directory specified by -out-dir. An output directory is required in all cases. See -out-dir and -out-find-path options.

Files to be moved, also referred as ``input files'', are specified either via -in-select option and/or as argument(s) given after all the options.

Order of input selection regex, -in-select or -select option, and output directory specified is important. One-to-one relation exists between each regex to select input files and the output directory, either explicitly specified or found in path specified via -out-file-path option.

#!/usr/local/bin/perl

use warnings;
use strict;

use File::Find;
use File::Copy;
use File::Path;
use File::Basename;

use Pod::Usage;
use Getopt::Long
  qw(:config
      gnu_compat
      bundling_override
      auto_abbrev
      no_ignore_case
      no_debug
    );

use lib '/to/perl/modules';
use Parv::Util
  qw( def_value
      uniq_elements
      filled_arrayref
    );

sub VERSION
{ #  It would be nice to call a sub inside pod so that i do not need t
+o
  #  worry about the version listed at two places: in this sub & in po
+d.
  #  So, do the other way: read through file & search for version.
  #  Slower, but not much to care about.
  while(<DATA>)
  { return $1 if m/^\s*version:\s+(.+)/io;
  }
  return 0;
}

my $noisy = 1;  # relatively quiet operation

sub out_msg  { print $_ , "\n" for @_; }
sub out_exit { print $_ , "\n" for @_ ; exit 1 ; }

sub err_msg       { warn  $_ , "\n" for @_; }
sub info_msg      { $noisy     and &err_msg; }
sub high_info_msg { $noisy > 1 and &err_msg; }

sub pod_err_msg
{ pod2usage( '-msg' => join("\n" , @_)
           , '-exitval' => 1 , '-verbose' => 0
           );
}

sub array_to_regex
{ my ($arr) = @_;
  return undef unless filled_arrayref($arr) ;
  my $re = join '|' , @{ uniq_elements( $arr ) } ;
  return qr/$re/;
}

sub match_file
{ my ($file , $select , $ignore , $type) = @_;

  return
    ( defined $select ? $file =~ m/$select/ : 1 )
    && ( defined $ignore ? $file !~ m/$ignore/ : 1 )
    && ( (defined $type && $type =~ m/^d/i) ? -d $file : -f $file )
    ;
}

sub make_filter
{ return
    ( 'select' => []      #  regex to select files
    , 'ignore' => []      #  regex to ignore files
    , 'type'   => 'file'  #  type of file to look for
    ) ;
}

sub make_find_opt
{ return
    ( 'no_chdir'    => 1
    , 'bydepth'     => 0
    , 'follow'      => 0
    , 'follow_fast' => 0
    , 'follow_skip' => 2
    ) ;
}

my $in_dir =
  { 'filter' => { make_filter() }
  , 'find'   => { make_find_opt() }
  , 'dir'    => [ '.' ]
  , 'move-map' => {}  #  one-to-one map of each selection & destinatio
+n
  } ;
my $out_dir =
  { 'filter'    => { make_filter() }
  , 'find'      => { make_find_opt() }
  , 'overwrite' => 0
  , 'dir'       => []
  , 'find-path' => []  #  paths to look for additional directories in
  } ;
my $opt =
  { 'usage'   => 0
  , 'dry-run' => 0
  , 'version' => 0
  } ;

get_options(\$noisy , $in_dir , $out_dir , $opt);

check_options($in_dir , $out_dir , $opt);

process($in_dir , $out_dir , $opt);


sub process
{
  my ($in , $out , $opt) = @_;

  my ($in_selects , $out_dirs) =
    map ref $_ && scalar @{$_} ? scalar @{$_}  -1  : 0
    , ( $in->{'filter'}{'select'} , $out->{'dir'} )
    ;

  foreach my $in_dir ( @{ $in->{'dir'} } )
  {
    while ( my ($select , $dest) = each %{$in->{'move-map'}} )
    { next unless defined $dest;
      find( { %{ $in->{'find'} }
            , 'wanted' =>
                sub
                { my $name = $File::Find::fullname || $File::Find::nam
+e;
                  match_file( $name
                            , $select
                            , $in->{'filter'}{'ignore'}
                            , $in->{'filter'}{'type'}
                            )
                  and do
                  { info_msg "$name  ->  $dest ..." ;
                    move_file( $opt->{'dry-run'} , $name , $dest
                             , $out->{'overwrite'}
                             );
                  } ;
                }
            }
          , $in_dir
          ) ;
    }
  }
}

sub move_file
{
  my ($dry_run , $full_path , $dest , $overwrite) = @_;

  { my $file = basename($full_path , '');
    if ( !$overwrite && -e "${dest}/${file}" )
    { high_info_msg "*  ${dest}/${file} already exists; skipped";
      return;
    }
  }

  if ( -e $dest && ! -d _ )
  { pod_err_msg "'$dest' is not a directory";
    return;
  }

  unless ( -e $dest )
  { high_info_msg "Creating $dest ..." ;
    make_dir($dry_run , $dest);
  }

  if ( $dry_run )
  { printf "move %s %s\n" , ($full_path, $dest);
    return;
  }

  move($full_path , $dest)
    or out_msg "*  Can't move '$full_path' to '$dest': $!" ;

  return;
}

sub make_dir
{
  my ($dry_run , $dir) = @_;

  if ( $dry_run )
  { printf "mkpath %s\n" , $dir;
    return;
  }

  eval { mkpath($dir , undef); };
  out_exit "*  Cannot create '$dir' directory: $@"
    if $@;

  return;
}

sub find_out_dir
{ my ($out , $move_map) = @_;

  my $dirs =
    find_dirs( $out->{'find-path'} , $out->{'find'} , $out->{'filter'}
+ ) ;

  while ( my ($k , $v ) = each %$dirs ) { $move_map->{$k} = $v; }

  return ;
}

sub find_dirs
{
  my ($start_path , $find_opt , $filter) = @_;
  my %dirs = ( );

  foreach my $selection ( @{$filter->{'select'}} )
  { find( { %{ $find_opt }
          , 'wanted' =>
              sub
              { return if exists $dirs{ $File::Find::dir };

                $dirs{ $selection } = $File::Find::dir
                  if match_file( ( $File::Find::fullname
                                 || $File::Find::name
                                 )
                               , $selection
                               , $filter->{'ignore'}
                               , $filter->{'type'}
                               ) ;
              }
          }
        , @$start_path
        ) ;
  }
  return +{} unless scalar keys %dirs;
  return \%dirs;
}

sub get_options
{ my ($noisy , $in , $out , $opt) = @_;

  my %out_f =
    ( 'ignore' => {}
    ) ;
  my %in_f =
    ( 'ignore' => {}

    #  Ideally, Getopt::Long should be able to *assign* values instead
+ of
    #  pushing them, or at least have some option to specify the
    #  behaviour.  This key-value pair is only to work around that.  I
+ts
    #  value is assigned to, not pushed in, $main::in_dir{'dir'},
    #  overriding the current directory entry.
    #
    , 'dir'    => []

    ) ;
  my %both =
    ( 'select-args' => undef
    , 'select'      => []
    , 'ignore'      => {}
    , 'bydepth'     => undef
    , 'follow'      => undef
    , 'fast_follow' => undef
    ) ;

  GetOptions( 'h|usage|help'     => \$opt->{'usage'}
            , 'V|version'        => \$opt->{'version'}
            , 'n|dry-run'        => \$opt->{'dry-run'}
            , 'v|noisy|verbose+' => $noisy
            , 'q|quiet'          => sub { $$noisy = 0 }

            , 'S|select-args'   => \$both{'select-args'}
            , 's|select=s@'     => $both{'select'}
            , 'i|ignore=s%'     => $both{'ignore'}
            , 't|type=s'        => \$both{'type'}
            , 'D|depth'         => \$both{'bydepth'}
            , 'L|follow'        => \$both{'follow'}
            , 'ff|fast-follow'  => \$both{'fast_follow'}

            , 'id|in-dir=s@'        => $in_f{'dir'}
            , 'iD|in-depth'         => \$in->{'find'}{'bydepth'}
            , 'iL|in-follow'        => \$in->{'find'}{'follow'}
            , 'iF|in-follow_fast'   => \$in->{'find'}{'follow_fast'}
            , 'is|in-select=s@'     => $in->{'filter'}{'select'}
            , 'it|in-type=s'        => \$in->{'filter'}{'type'}
            , 'ii|in-ignore=s%'      => $in_f{'ignore'}

            , 'f|overwrite'           => \$out->{'overwrite'}
            , 'od|out-dir=s@'         => $out->{'dir'}
            , 'ofp|out-find-path=s@'  => $out->{'find-path'}
            , 'oD|out-depth'          => \$out->{'find'}{'bydepth'}
            , 'oL|out-follow'         => \$out->{'find'}{'follow'}
            , 'oF|out-follow_fast'    => \$out->{'find'}{'follow_fast'
+}
            , 'os|out-select=s@'      => $out->{'filter'}{'select'}
            , 'ot|out-type=s'         => \$out->{'filter'}{'type'}
            , 'oi|out-ignore=s%'     => $out_f{'ignore'}
            )
    or die pod2usage('-exitval'  => 2 , '-verbose'  => 0);

  pod2usage('-exitval' => 0 , '-verbose' => 3) if $opt->{'usage'};

  out_msg( VERSION() ) , exit 0 if $opt->{'version'} ;

  #  save remaining arguments as regexen to select movable files
  if ( scalar @ARGV )
  { push @{ $in->{'filter'}{'select'} } , @ARGV;

    push @{ $out->{'filter'}{'select'} } , @ARGV
      if $both{'select-args'} ;

    undef @ARGV;
  }

  #  Ignore current directory, use the given directory(ies)
  $in->{'dir'} = $in_f{'dir'} if scalar @{$in_f{'dir'}};

  collect_ignore( 'ignore' , \%in_f  , $in->{'filter'}  );
  collect_ignore( 'ignore' , \%out_f , $out->{'filter'} );

  my $simple_set =
    sub
    { my ($key , $hash) = @_;
      $hash->{$key} = $both{$key}
        if !$hash->{$key} && defined $both{$key} ;
    } ;

  foreach my $io ($in , $out)
  { map $simple_set->($_ , $io->{'find'})
    , qw(bydepth follow fast_follow) ;

    $simple_set->('type' , $io->{'filter'});
    collect_ignore( 'ignore' , \%both  , $io->{'filter'} );
    collect_select( 'select' , \%both  , $io->{'filter'} );
  }

  return;
}

sub check_options
{ my ($in , $out ) = @_;

  foreach my $d (@{$in->{'dir'} })
  {
    pod_err_msg "'$d' is not a directory or does not exist."
      unless -d $d;
  }

  pod_err_msg 'No patterns given'
    unless $in->{'filter'}{'select'} ;

  pod_err_msg "No output directories found or given"
    unless filled_arrayref( $out->{'dir'} )
    or filled_arrayref( $out->{'find-path'} ) ;

  #  find output directories if possible
  $out->{'find-path'} = uniq_elements( $out->{'find-path'} );
  fill_move_map($in , $out);

  return;
}

sub fill_move_map
{ my ($in , $out) = @_;

  #  Add regex<->destination path mappings
  unless ( filled_arrayref($out->{'find-path'}) )
  { foreach my $re ( @{$in->{'filter'}{'select'}} )
    { $in->{'move-map'}{$re} = $_ foreach @{$out->{'dir'}} ;
    }
  }
  else
  { find_out_dir($out , $in->{'move-map'});

    #  Add /additional/ regex<->destination path mappings
    if ( scalar @{$in->{'filter'}{'select'}}
          > scalar keys %{$in->{'move-map'}}
       )
    { unless ( filled_arrayref($out->{'dir'}) )
      { err_msg
        ( "No output directory explicitly specified."
        , "Some of the patterns may be skipped."
        ) ;
      }
      else
      { foreach my $re ( @{$in->{'filter'}{'select'}} )
        { foreach my $key ( keys %{$in->{'move-map'}} )
          { next if $key eq $re;
            $in->{'move-map'}{$re} = $out->{'dir'}[-1];
          }
        }
      }
    }
  }

  return;
}

sub collect_select
{ my ($key , $in , $save) = @_;

  my @pattern = @{ $in->{$key} };

  return unless scalar @pattern;

  $save->{$key} =
    [ map qr/$_/
        , ( filled_arrayref( $save->{$key} ) ? @{$save->{$key}} : ()
          , @pattern
          )
    ] ;

  return;
}

sub collect_ignore
{ my ($key , $in , $save) = @_;

  my @pattern = keys %{ $in->{$key} };

  return unless scalar @pattern;

  $save->{$key} =
    array_to_regex([ sort {$a cmp $b or $a <=> $b} @pattern ]) ;
  return;
}

__DATA__

=head1 NAME

mvre - MoVe files based on given Regular Expressions

=head1 SYNOPSIS

B<mvre>  B<-usage>

B<mvre>  B<-out-dir> I<directory> B<regex>

B<mvre> [ I<options> ] [ B<-dry-run> ] [ B<-in-dir> F<directory> ]
  < B<-out-dir> | B<-od> > F<directory>
  B<regex> [ B<regex 2> ] [ B<regex 3> ] ...

=head1 DESCRIPTION

In default mode, B<mvre> moves file located in the current directory,
matching (Perl) regex, to directory specified by B<-out-dir>.  An outp
+ut
directory is required in all cases.  See B<-out-dir> and
B<-out-find-path> options.

Files to be moved, also referred as "input files", are specified eithe
+r
via B<-in-select> option and/or as argument(s) given after all the
options.

Order of input selection regex, I<-in-select> or I<-select> option, an
+d
output directory specified is important.  One-to-one relation exists
between each regex to select input files and the output directory, eit
+her
explicitly specified or found in path specified via I<-out-file-path>
option.

=head1 OPTIONS

=head2 Program Control

=over 2

=item B<-dry-run> | B<-n>

Show which files will be moved to where, instead of actually moving an
+ything.


=item B<-help> | B<-usage> | B<-h>

Show this message.

=item B<-quiet> | B<-q>

See messages related to only grave error conditions.

=item B<-verbose> | B<-v>

Control verbosity.  Specify B<-v> or B<-verbose> multiple times to
increase verbosity.  Default is 1; all, grave or minor, error conditio
+ns
are signaled.

=item B<-version> | B<-V>

Show version.

=back

=head2 Combined Input & Output Options

This set of options is shared between both I<Input Options> and I<Outp
+ut
Options>; do see the individual sections.

=over 2

=item B<-depth> | B<-D>

Process all the files in a directory before going in any other directo
+ry
or doing anything with the directory.

=item B<-follow> | B<-L>

Follow symbolic link.  Default is to skip.

=item B<-follow-fast> | B<-ff>

See I<File::Find> documentation as i, the author, fail to understand h
+ow
it is different than just I<-follow>.  This option is provided for tho
+se
who understand its description.  To quote...

  This is similar to "follow" except that it may report some files
  more than once.  It does detect cycles, however.  Since only
  symbolic links have to be hashed, this is much cheaper both in space
  and time.  If processing a file more than once (by the user's
  "wanted()" function) is worse than just taking time, the option
  "follow" should be used.

=item < B<-ignore> | B<-i> > I<regex>

Ignore I<files> matching the regex.

Multiple regexen can be specified as C<-i regex-1 -i regex-2>.

=item < B<-select> | B<-s> > I<regex>

Select I<files> matching the regex.

See I<-ignore> option to specify multiple patterns.

=item B<-select-args> | B<-S>

Use arguments, after the options, as patterns to select both input and
output files.  This option allows to skip repetition of I<-select>,
I<-in-select>, and/or I<-out-select> for each pattern given.

=item < B<-type> | B<-t> > I<dir> | I<file>

Type of file to select.  Default file type is I<file>.

THIS SETTING NEEDS TO ACCOMMODATE OTHER FILE TYPES.  Perhaps some time
+ in
future...

=back

=head2 Input Options

=over 2

=item < B<-in-dir> | B<-id> > F<directory>

Specifies directory to start looking for input files matching given
regex(en).  Default is current directory.

Multiple directories can be specified C<-id dir-1 -id dir-2>.  Each
directory is assumed to go with each input file pattern.

=item B<-in-depth> | B<-iD>

Process all the files in a input directory before going in any other
directory or doing anything with the directory.

=item B<-in-follow> | B<-iL>

Follow symbolic link related to input files.  Default is to skip.

=item B<-in-follow-fast> | B<-iff>

See I<Combined Input & Output Options>; applies only to input files.

=item < B<-in-ignore> | B<-ii> > I<regex>

Ignore input I<files> matching the regex.

Multiple regexen can be specified as C<-i regex-1 -i regex-2>.

=item < B<-in-select> | B<-is> > I<regex>

Select input files matching the regex.

See I<-ignore> option to specify multiple patterns.

=item < B<-in-type> | B<-it> > I<dir> | I<file>

Type of file to select.  Default file type is I<file>.

=back

=head2 Output Options

One of B<-out-dir>, B<-out-find-path> options is B<required in all
cases>.

=over 2

=item < B<-out-dir> | B<-od> > F<directory>

Specifies the name of directory to be created to move input files.  If
only one directory is specified, then all the input files will be move
+d to
that directory.

If the given output directory does not exist, it will be created.

See I<-in-dir> option syntax to specify multiple directories.

=over 4

=item < B<-out-find-path> | B<-ofp> > F<path>

Specify the directory path in which to find output directories based o
+n
B<-out-ignore> and/or B<-out-select>.

See I<-in-dir> option syntax to specify multiple directories.

=back

=item B<-out-follow> | B<-oL>

Follow symbolic link related to output files.  Default is to skip.

=item B<-out-follow-fast> | B<-off>

See I<Combined Input & Output Options>; applies only to output files.

=item B<-out-depth> | B<-oD>

Process all the files in a output directory before going in any other
directory or doing anything with the directory.

=item B<-out-follow> | B<-oL>

Follow symbolic link.  Default is to skip.

=item B<-out-follow-fast> | B<-off>

See I<Combined Input & Output Options>; applies only to output files.

=item < B<-out-ignore> | B<-oi> > I<regex>

Ignore output files matching the regex.

Multiple regexen can be specified as C<-o regex-1 -o regex-2>.

=item < B<-out-select> | B<-os> > I<regex>

Select output files matching the regex.

See I<-ignore> option to specify multiple patterns.

=item < B<-out-type> | B<-ot> > I<dir> | I<file>

Type of file to select.  Default file type is I<file>.

=item B<-overwrite> | B<-f>

Specify B<-overwrite> to write over the preexisting files.  Default is
+ not
to overwrite.

=back

=head1  SEE ALSO

=over 2

=item find(1)

=item File::Find(3)

=back

=head1 AUTHOR, LICENSE, and such

Parv, parv_@yahoo.com

B<Disclaimer:>  This software is free to be used in any form only if
proper credit is given.  I am not liable for any kind of harm or loss;
use it at your own risk.

Version: 0.41 - Nov 23 2003

=cut
Replies are listed 'Best First'.
Re: mvre - updated to 0.41
by parv (Parson) on Nov 24, 2003 at 06:01 UTC

    I have completely updated the code presented in the parent thread as of version 0.41. This fixes check for existing files and includes some new features, important one being creation of all the intermediary directories, not just the last one.

Re: "Not-overwriting-existing-files" Bug Fix
by parv (Parson) on Nov 23, 2003 at 21:50 UTC

    If anybody has downloaded the program (versions < 0.40), please get a new copy from http://www103.pair.com/parv/comp/src/perl/mvre. There was a major bug that did not take care of not-overwriting-existing-files properly. It is fixed in version 0.40. There are some minor grammatical corrections & few new features also.

    The code presented in this thread has been fixed for this bug.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: sourcecode [id://308797]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2024-04-18 23:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found