Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Generate rsync filter from list of wanted sub-directories

by mscharrer (Hermit)
on Sep 28, 2011 at 15:55 UTC ( [id://928357]=CUFP: print w/replies, xml ) Need Help??

I recently had the need to set up an automated synchronization of a large rsync module (CTAN, yes 'T' not 'P') where I only wanted certain sub-directories to be synchronized. While rsync allows for include/exclude filter the used scheme is a little counter-intuitive. Direct paths like /some/dir/I/want/ don't work because rsync never sees that last dir want if you don't include all parent directories as well. This forces you to include all of them but then exclude all other files and dirs in them, e.g.:

+ /some/ + /some/dir/ + /some/dir/I/ + /some/dir/I/want/*** - /some/dir/I/* - /some/dir/* - /some/* - /*

The /*** here means "that directory plus everything in it, including sub-dirs". Older rsync versions needed that as two instructions / and /**.

I wrote a Perl script for this which creates a hash-of-hash(-of-hashes)* to hold the directory structure. (Actually Data::Diver would be well suited for this, but I decided to do my own loop because it was simple enough and that package is not part of the current installation on the target server.) Then the hash structure is recursively processed to include all parent directories first and exclude all other things in it afterwards. I added key sorting to get a sorted filter list which is easily proof-read. I added two warnings for the case when a directory should be included fully and partially. In this case it is always included fully.

Here the code:

#!/usr/bin/perl ###################################################################### +########## # Copyright (c) 2011 Martin Scharrer <martin@scharrer-online.de> # This is open source software under the GPL v3 or later. ###################################################################### +########## use strict; use warnings; my $include = {}; sub add_include { INCLUDE_PATH: foreach my $path (@_) { chomp $path; my @dirs = split (/\//, $path); shift @dirs if @dirs and $dirs[0] eq ''; my $lastdir = pop @dirs; my $ref = $include; foreach my $dir (@dirs) { my $dirref = $ref->{$dir}; if (defined $dirref) { if (ref $dirref ne 'HASH') { warn "Warning: directory '$dir' of '$path' already + fully included!\n"; next INCLUDE_PATH; } $ref = $dirref; } else { my $newdir = {}; $ref = $ref->{$dir} = $newdir; } } if (exists $ref->{$lastdir} && $ref->{$lastdir} ne '1') { warn "Warning: '$path' now fully included!\n"; } $ref->{$lastdir} = '1'; } } sub print_include { my $pdir = shift; my $h = shift; print "+ $pdir/\n"; foreach my $dir (sort keys %$h) { my $value = $h->{$dir}; if (ref $value ne 'HASH') { print "+ $pdir/$dir/***\n"; ## For older rsync versions use the following instead: #print "+ $pdir/$dir/\n"; #print "+ $pdir/$dir/**\n"; } else { print_include ("$pdir/$dir", $value); } } print "- $pdir/*\n"; } if (@ARGV) { add_include (@ARGV); } else { add_include <STDIN>; } print_include ('', $include); __END__

Usage Example
rsyncfilter.pl /a/b /a/c /a/d/a /a/d/b

gives:

+ / + /a/ + /a/b/*** + /a/c/*** + /a/d/ + /a/d/a/*** + /a/d/b/*** - /a/d/* - /a/* - /*

Replies are listed 'Best First'.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://928357]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-04-25 17:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found