mvre - MoVe files based on given Regular Expressions

Category:	Utility Scripts
Author/Contact Info	parv_@yahoo.com
Description:	UPDATE, Nov 24 2003: Completely updated the code as of version 0.41, fixes check for existing file w/ few new options/features. This modules requires Parv::Util, that i use quite often. Most current version of "mvre" is located at http://www103.pair.com/parv/comp/src/perl/mvre. Back to `mvre`; `pod2html` has this to say... In default mode, mvre moves file located in the current directory, matching (Perl) regex, to directory specified by -out-dir. An output directory is required in all cases. See -out-dir and -out-find-path options. Files to be moved, also referred as ``input files'', are specified either via -in-select option and/or as `argument(s)` given after all the options. Order of input selection regex, -in-select or -select option, and output directory specified is important. One-to-one relation exists between each regex to select input files and the output directory, either explicitly specified or found in path specified via -out-file-path option.
#!/usr/local/bin/perl use warnings; use strict; use File::Find; use File::Copy; use File::Path; use File::Basename; use Pod::Usage; use Getopt::Long qw(:config gnu_compat bundling_override auto_abbrev no_ignore_case no_debug ); use lib '/to/perl/modules'; use Parv::Util qw( def_value uniq_elements filled_arrayref ); sub VERSION { # It would be nice to call a sub inside pod so that i do not need t +o # worry about the version listed at two places: in this sub & in po +d. # So, do the other way: read through file & search for version. # Slower, but not much to care about. while(<DATA>) { return $1 if m/^\sversion:\s+(.+)/io; } return 0; } my $noisy = 1; # relatively quiet operation sub out_msg { print $_ , "\n" for @_; } sub out_exit { print $_ , "\n" for @_ ; exit 1 ; } sub err_msg { warn $_ , "\n" for @_; } sub info_msg { $noisy and &err_msg; } sub high_info_msg { $noisy > 1 and &err_msg; } sub pod_err_msg { pod2usage( '-msg' => join("\n" , @_) , '-exitval' => 1 , '-verbose' => 0 ); } sub array_to_regex { my ($arr) = @_; return undef unless filled_arrayref($arr) ; my $re = join '\|' , @{ uniq_elements( $arr ) } ; return qr/$re/; } sub match_file { my ($file , $select , $ignore , $type) = @_; return ( defined $select ? $file =~ m/$select/ : 1 ) && ( defined $ignore ? $file !~ m/$ignore/ : 1 ) && ( (defined $type && $type =~ m/^d/i) ? -d $file : -f $file ) ; } sub make_filter { return ( 'select' => [] # regex to select files , 'ignore' => [] # regex to ignore files , 'type' => 'file' # type of file to look for ) ; } sub make_find_opt { return ( 'no_chdir' => 1 , 'bydepth' => 0 , 'follow' => 0 , 'follow_fast' => 0 , 'follow_skip' => 2 ) ; } my $in_dir = { 'filter' => { make_filter() } , 'find' => { make_find_opt() } , 'dir' => [ '.' ] , 'move-map' => {} # one-to-one map of each selection & destinatio +n } ; my $out_dir = { 'filter' => { make_filter() } , 'find' => { make_find_opt() } , 'overwrite' => 0 , 'dir' => [] , 'find-path' => [] # paths to look for additional directories in } ; my $opt = { 'usage' => 0 , 'dry-run' => 0 , 'version' => 0 } ; get_options(\$noisy , $in_dir , $out_dir , $opt); check_options($in_dir , $out_dir , $opt); process($in_dir , $out_dir , $opt); sub process { my ($in , $out , $opt) = @_; my ($in_selects , $out_dirs) = map ref $_ && scalar @{$_} ? scalar @{$_} -1 : 0 , ( $in->{'filter'}{'select'} , $out->{'dir'} ) ; foreach my $in_dir ( @{ $in->{'dir'} } ) { while ( my ($select , $dest) = each %{$in->{'move-map'}} ) { next unless defined $dest; find( { %{ $in->{'find'} } , 'wanted' => sub { my $name = $File::Find::fullname \|\| $File::Find::nam +e; match_file( $name , $select , $in->{'filter'}{'ignore'} , $in->{'filter'}{'type'} ) and do { info_msg "$name -> $dest ..." ; move_file( $opt->{'dry-run'} , $name , $dest , $out->{'overwrite'} ); } ; } } , $in_dir ) ; } } } sub move_file { my ($dry_run , $full_path , $dest , $overwrite) = @_; { my $file = basename($full_path , ''); if ( !$overwrite && -e "${dest}/${file}" ) { high_info_msg " ${dest}/${file} already exists; skipped"; return; } } if ( -e $dest && ! -d _ ) { pod_err_msg "'$dest' is not a directory"; return; } unless ( -e $dest ) { high_info_msg "Creating $dest ..." ; make_dir($dry_run , $dest); } if ( $dry_run ) { printf "move %s %s\n" , ($full_path, $dest); return; } move($full_path , $dest) or out_msg "* Can't move '$full_path' to '$dest': $!" ; return; } sub make_dir { my ($dry_run , $dir) = @_; if ( $dry_run ) { printf "mkpath %s\n" , $dir; return; } eval { mkpath($dir , undef); }; out_exit "* Cannot create '$dir' directory: $@" if $@; return; } sub find_out_dir { my ($out , $move_map) = @_; my $dirs = find_dirs( $out->{'find-path'} , $out->{'find'} , $out->{'filter'} + ) ; while ( my ($k , $v ) = each %$dirs ) { $move_map->{$k} = $v; } return ; } sub find_dirs { my ($start_path , $find_opt , $filter) = @_; my %dirs = ( ); foreach my $selection ( @{$filter->{'select'}} ) { find( { %{ $find_opt } , 'wanted' => sub { return if exists $dirs{ $File::Find::dir }; $dirs{ $selection } = $File::Find::dir if match_file( ( $File::Find::fullname \|\| $File::Find::name ) , $selection , $filter->{'ignore'} , $filter->{'type'} ) ; } } , @$start_path ) ; } return +{} unless scalar keys %dirs; return \%dirs; } sub get_options { my ($noisy , $in , $out , $opt) = @_; my %out_f = ( 'ignore' => {} ) ; my %in_f = ( 'ignore' => {} # Ideally, Getopt::Long should be able to assign values instead + of # pushing them, or at least have some option to specify the # behaviour. This key-value pair is only to work around that. I +ts # value is assigned to, not pushed in, $main::in_dir{'dir'}, # overriding the current directory entry. # , 'dir' => [] ) ; my %both = ( 'select-args' => undef , 'select' => [] , 'ignore' => {} , 'bydepth' => undef , 'follow' => undef , 'fast_follow' => undef ) ; GetOptions( 'h\|usage\|help' => \$opt->{'usage'} , 'V\|version' => \$opt->{'version'} , 'n\|dry-run' => \$opt->{'dry-run'} , 'v\|noisy\|verbose+' => $noisy , 'q\|quiet' => sub { $$noisy = 0 } , 'S\|select-args' => \$both{'select-args'} , 's\|select=s@' => $both{'select'} , 'i\|ignore=s%' => $both{'ignore'} , 't\|type=s' => \$both{'type'} , 'D\|depth' => \$both{'bydepth'} , 'L\|follow' => \$both{'follow'} , 'ff\|fast-follow' => \$both{'fast_follow'} , 'id\|in-dir=s@' => $in_f{'dir'} , 'iD\|in-depth' => \$in->{'find'}{'bydepth'} , 'iL\|in-follow' => \$in->{'find'}{'follow'} , 'iF\|in-follow_fast' => \$in->{'find'}{'follow_fast'} , 'is\|in-select=s@' => $in->{'filter'}{'select'} , 'it\|in-type=s' => \$in->{'filter'}{'type'} , 'ii\|in-ignore=s%' => $in_f{'ignore'} , 'f\|overwrite' => \$out->{'overwrite'} , 'od\|out-dir=s@' => $out->{'dir'} , 'ofp\|out-find-path=s@' => $out->{'find-path'} , 'oD\|out-depth' => \$out->{'find'}{'bydepth'} , 'oL\|out-follow' => \$out->{'find'}{'follow'} , 'oF\|out-follow_fast' => \$out->{'find'}{'follow_fast' +} , 'os\|out-select=s@' => $out->{'filter'}{'select'} , 'ot\|out-type=s' => \$out->{'filter'}{'type'} , 'oi\|out-ignore=s%' => $out_f{'ignore'} ) or die pod2usage('-exitval' => 2 , '-verbose' => 0); pod2usage('-exitval' => 0 , '-verbose' => 3) if $opt->{'usage'}; out_msg( VERSION() ) , exit 0 if $opt->{'version'} ; # save remaining arguments as regexen to select movable files if ( scalar @ARGV ) { push @{ $in->{'filter'}{'select'} } , @ARGV; push @{ $out->{'filter'}{'select'} } , @ARGV if $both{'select-args'} ; undef @ARGV; } # Ignore current directory, use the given directory(ies) $in->{'dir'} = $in_f{'dir'} if scalar @{$in_f{'dir'}}; collect_ignore( 'ignore' , \%in_f , $in->{'filter'} ); collect_ignore( 'ignore' , \%out_f , $out->{'filter'} ); my $simple_set = sub { my ($key , $hash) = @_; $hash->{$key} = $both{$key} if !$hash->{$key} && defined $both{$key} ; } ; foreach my $io ($in , $out) { map $simple_set->($_ , $io->{'find'}) , qw(bydepth follow fast_follow) ; $simple_set->('type' , $io->{'filter'}); collect_ignore( 'ignore' , \%both , $io->{'filter'} ); collect_select( 'select' , \%both , $io->{'filter'} ); } return; } sub check_options { my ($in , $out ) = @_; foreach my $d (@{$in->{'dir'} }) { pod_err_msg "'$d' is not a directory or does not exist." unless -d $d; } pod_err_msg 'No patterns given' unless $in->{'filter'}{'select'} ; pod_err_msg "No output directories found or given" unless filled_arrayref( $out->{'dir'} ) or filled_arrayref( $out->{'find-path'} ) ; # find output directories if possible $out->{'find-path'} = uniq_elements( $out->{'find-path'} ); fill_move_map($in , $out); return; } sub fill_move_map { my ($in , $out) = @_; # Add regex<->destination path mappings unless ( filled_arrayref($out->{'find-path'}) ) { foreach my $re ( @{$in->{'filter'}{'select'}} ) { $in->{'move-map'}{$re} = $_ foreach @{$out->{'dir'}} ; } } else { find_out_dir($out , $in->{'move-map'}); # Add /additional/ regex<->destination path mappings if ( scalar @{$in->{'filter'}{'select'}} > scalar keys %{$in->{'move-map'}} ) { unless ( filled_arrayref($out->{'dir'}) ) { err_msg ( "No output directory explicitly specified." , "Some of the patterns may be skipped." ) ; } else { foreach my $re ( @{$in->{'filter'}{'select'}} ) { foreach my $key ( keys %{$in->{'move-map'}} ) { next if $key eq $re; $in->{'move-map'}{$re} = $out->{'dir'}[-1]; } } } } } return; } sub collect_select { my ($key , $in , $save) = @_; my @pattern = @{ $in->{$key} }; return unless scalar @pattern; $save->{$key} = [ map qr/$_/ , ( filled_arrayref( $save->{$key} ) ? @{$save->{$key}} : () , @pattern ) ] ; return; } sub collect_ignore { my ($key , $in , $save) = @_; my @pattern = keys %{ $in->{$key} }; return unless scalar @pattern; $save->{$key} = array_to_regex([ sort {$a cmp $b or $a <=> $b} @pattern ]) ; return; } __DATA__ =head1 NAME mvre - MoVe files based on given Regular Expressions =head1 SYNOPSIS B<mvre> B<-usage> B<mvre> B<-out-dir> I<directory> B<regex> B<mvre> [ I<options> ] [ B<-dry-run> ] [ B<-in-dir> F<directory> ] < B<-out-dir> \| B<-od> > F<directory> B<regex> [ B<regex 2> ] [ B<regex 3> ] ... =head1 DESCRIPTION In default mode, B<mvre> moves file located in the current directory, matching (Perl) regex, to directory specified by B<-out-dir>. An outp +ut directory is required in all cases. See B<-out-dir> and B<-out-find-path> options. Files to be moved, also referred as "input files", are specified eithe +r via B<-in-select> option and/or as argument(s) given after all the options. Order of input selection regex, I<-in-select> or I<-select> option, an +d output directory specified is important. One-to-one relation exists between each regex to select input files and the output directory, eit +her explicitly specified or found in path specified via I<-out-file-path> option. =head1 OPTIONS =head2 Program Control =over 2 =item B<-dry-run> \| B<-n> Show which files will be moved to where, instead of actually moving an +ything. =item B<-help> \| B<-usage> \| B<-h> Show this message. =item B<-quiet> \| B<-q> See messages related to only grave error conditions. =item B<-verbose> \| B<-v> Control verbosity. Specify B<-v> or B<-verbose> multiple times to increase verbosity. Default is 1; all, grave or minor, error conditio +ns are signaled. =item B<-version> \| B<-V> Show version. =back =head2 Combined Input & Output Options This set of options is shared between both I<Input Options> and I<Outp +ut Options>; do see the individual sections. =over 2 =item B<-depth> \| B<-D> Process all the files in a directory before going in any other directo +ry or doing anything with the directory. =item B<-follow> \| B<-L> Follow symbolic link. Default is to skip. =item B<-follow-fast> \| B<-ff> See I<File::Find> documentation as i, the author, fail to understand h +ow it is different than just I<-follow>. This option is provided for tho +se who understand its description. To quote... This is similar to "follow" except that it may report some files more than once. It does detect cycles, however. Since only symbolic links have to be hashed, this is much cheaper both in space and time. If processing a file more than once (by the user's "wanted()" function) is worse than just taking time, the option "follow" should be used. =item < B<-ignore> \| B<-i> > I<regex> Ignore I<files> matching the regex. Multiple regexen can be specified as C<-i regex-1 -i regex-2>. =item < B<-select> \| B<-s> > I<regex> Select I<files> matching the regex. See I<-ignore> option to specify multiple patterns. =item B<-select-args> \| B<-S> Use arguments, after the options, as patterns to select both input and output files. This option allows to skip repetition of I<-select>, I<-in-select>, and/or I<-out-select> for each pattern given. =item < B<-type> \| B<-t> > I<dir> \| I<file> Type of file to select. Default file type is I<file>. THIS SETTING NEEDS TO ACCOMMODATE OTHER FILE TYPES. Perhaps some time + in future... =back =head2 Input Options =over 2 =item < B<-in-dir> \| B<-id> > F<directory> Specifies directory to start looking for input files matching given regex(en). Default is current directory. Multiple directories can be specified C<-id dir-1 -id dir-2>. Each directory is assumed to go with each input file pattern. =item B<-in-depth> \| B<-iD> Process all the files in a input directory before going in any other directory or doing anything with the directory. =item B<-in-follow> \| B<-iL> Follow symbolic link related to input files. Default is to skip. =item B<-in-follow-fast> \| B<-iff> See I<Combined Input & Output Options>; applies only to input files. =item < B<-in-ignore> \| B<-ii> > I<regex> Ignore input I<files> matching the regex. Multiple regexen can be specified as C<-i regex-1 -i regex-2>. =item < B<-in-select> \| B<-is> > I<regex> Select input files matching the regex. See I<-ignore> option to specify multiple patterns. =item < B<-in-type> \| B<-it> > I<dir> \| I<file> Type of file to select. Default file type is I<file>. =back =head2 Output Options One of B<-out-dir>, B<-out-find-path> options is B<required in all cases>. =over 2 =item < B<-out-dir> \| B<-od> > F<directory> Specifies the name of directory to be created to move input files. If only one directory is specified, then all the input files will be move +d to that directory. If the given output directory does not exist, it will be created. See I<-in-dir> option syntax to specify multiple directories. =over 4 =item < B<-out-find-path> \| B<-ofp> > F<path> Specify the directory path in which to find output directories based o +n B<-out-ignore> and/or B<-out-select>. See I<-in-dir> option syntax to specify multiple directories. =back =item B<-out-follow> \| B<-oL> Follow symbolic link related to output files. Default is to skip. =item B<-out-follow-fast> \| B<-off> See I<Combined Input & Output Options>; applies only to output files. =item B<-out-depth> \| B<-oD> Process all the files in a output directory before going in any other directory or doing anything with the directory. =item B<-out-follow> \| B<-oL> Follow symbolic link. Default is to skip. =item B<-out-follow-fast> \| B<-off> See I<Combined Input & Output Options>; applies only to output files. =item < B<-out-ignore> \| B<-oi> > I<regex> Ignore output files matching the regex. Multiple regexen can be specified as C<-o regex-1 -o regex-2>. =item < B<-out-select> \| B<-os> > I<regex> Select output files matching the regex. See I<-ignore> option to specify multiple patterns. =item < B<-out-type> \| B<-ot> > I<dir> \| I<file> Type of file to select. Default file type is I<file>. =item B<-overwrite> \| B<-f> Specify B<-overwrite> to write over the preexisting files. Default is + not to overwrite. =back =head1 SEE ALSO =over 2 =item find(1) =item File::Find(3) =back =head1 AUTHOR, LICENSE, and such Parv, parv_@yahoo.com B<Disclaimer:> This software is free to be used in any form only if proper credit is given. I am not liable for any kind of harm or loss; use it at your own risk. Version: 0.41 - Nov 23 2003 =cut

Comment on mvre - MoVe files based on given Regular Expressions Download Code


good chemistry is complicated, and a little bit messy -LW
	PerlMonks