Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

scp cronjob

by Anonymous Monk
on Sep 11, 2002 at 11:13 UTC ( [id://196915]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

hi perlmonks, i run this script as a cronjob every minute, it seems to work fine, but, every so often files are copied across that already exist and shouldn't be copied. This is killing my bandwidth and I can't work out what's causing it. One thing I notice is that if i run the script from the command prompt, and ctrl-c, files are sometimes copied across which are already there. Any ides?
use File::Find; use File::Basename; @backupFiles=(); %chompedList=(); $list="/usr/scripts/blist.txt"; if(-e $list){unlink($list);} system("ssh user\@backupserver ls -c /usr/local/apache/htdocs >$list") +; open(LIST, "$list") or die "$!"; @backupFiles = <LIST>; foreach $file (@backupFiles){chomp $file; $chompedList{$file} = $file; +} find(\&dofile, </usr/myfiles/*.htm>); sub dofile { ($name,$path,$suffix) = fileparse($File::Find::name); if (not exists $chompedList{"$name$suffix"}) { system("scp -C \"$File::Find::name\" user\@backupserver:/usr/local/apa +che/htdocs"); } }#end find close(list); unlink($list);

Replies are listed 'Best First'.
Re: scp cronjob
by ides (Deacon) on Sep 11, 2002 at 14:24 UTC

    Have you looked into using rsync? It only trasmits the differences between files and greatly reduces the bandwidth used. I use it to backup hundreds of GBs of data each day. It would negate the need for your script and it works over SSH.

    To backup and/or mirror from the current directory to to your other server would go something like this in cron:

    export RSYNC_RSH="/usr/bin/ssh -C" 0 0 * * * /usr/bin/rsync -ar * user@backupserver:/usr/local/apache/ht +docs

    Hope this helps.

    -----------------------------------
    Frank Wiles <frank@wiles.org>
    http://frank.wiles.org
Re: scp cronjob
by mp (Deacon) on Sep 11, 2002 at 16:48 UTC
    You are currently making a separate system call to run scp for each file that needs to be copied. This is inefficient because of the overhead of starting up a new process and the overhead of initiating a new connection to the remote machine. If you are running this every minute, and if the script takes more than a minute to run, you will also have multiple instances of the script running simultaneously.

    Rsync, as mentioned above, would be a much better option. If you are primarily copying html files, you may want to also turn on compression (-z option).

    /usr/bin/rsync -az -e ssh source_directory/ \ user@backupserver:destination_directory -e ssh tells rsync to use ssh for transport -a is archive mode, which gives recursion and preserves permissions an +d such. -ar would be harmlessly redundant. -v will show filenames during the transfer.
    The trailing slash on the source directory is important. See man rsync for more info.
Re: scp cronjob
by kabel (Chaplain) on Sep 11, 2002 at 12:58 UTC
    i suggest to put some running information inside a log file. that helped me alot in former times ;)
    ###################################################################### +######### sub do_log { ###################################################################### +######### my $output_string = ""; my $log_file = "/some/where"; if (not defined $_[0]) { $output_string = "\n"; } else { $output_string = "$$: " . scalar (gmtime ()) . ": " . join ("" +, @_) . "\n"; } print STDERR $output_string unless ($opts{n}); unless ($opts{l}) { if (-e $log_file) { open (LOGFILE, ">> $log_file") or die "cannot write log [$ +log_file][$!]"; } else { open (LOGFILE, "> $log_file") or die "cannot create log [$ +log_file][$!]"; } print LOGFILE $output_string; close (LOGFILE); } }
    please let me know if the sub can be improved.
Re: scp cronjob
by Util (Priest) on Sep 11, 2002 at 17:43 UTC

    My best guess is that the $list file is getting over-written by multiple copies of the Perl script, when your bandwidth limits cause one copy to hang long enough for another copy to start.

    Here are some other thoughts:

  • You don't need the $list file. Use backticks instead:
    @backupFiles = `ssh user\@backupserver ls /usr/local/apache/htdocs`;
  • You are using File::Find and "globbing" (</path/*.htm>) together. This is redundant; use one or the other.
  • Do you really want to back up only the new files, but not any changed files? This confuses me.
  • ls -c is an odd command for this program; you are asking ls to sort by ctime, then throwing away the sort order by using a hash. Plain ls would be clearer.
  • Rather than use cron, you might just have the Perl script run as a daemon, never ending until you kill it. This would prevent the multi-copy problem, too.
    while (1) { do_something(); sleep 60; }
  • If bandwidth is a problem, keep a flagfile whoses modtime is equal to the time of the last scp transfer. If no local file is newer than the flagfile, you can skip the ssh traffic.
  • You are calling scp once for each file; instead, you could build a list of files and call scp once:
    my @files = map { "'$_'" } grep {not $chompedList{basename($_)} </usr/myfiles/*.htm>; system "scp -C @files user\@backupserver:/usr/local/apache/htdocs";
  • You are using fileparse where basename would be clearer.
  • Finally, listen to ides. Rsync was designed for this kind of job. It has lots of options to fine-tune what gets synced and how; it can even limit its bandwidth use. Rsync should work great by itself as a cron job, but here is a (lightly tested) script to demonstrate my other points:
    #!/usr/bin/perl -W use warnings 'all'; use strict; my $rmthost = 'backupserver'; my $rmtuser = 'user'; my $rmtpath = '/usr/local/apache/htdocs'; my $lclpath = '/usr/myfiles'; my $lclglob = '*.htm'; my $flagfile = '.last_backup'; my $r_opts = "-azq --blocking-io -e 'ssh -l $rmtuser'"; my $lclfiles = "$lclpath/$lclglob"; sub modtime { my $file = shift; my @stats = stat $file or return 0; return $stats[9]; } while (1) { my $timestamp = modtime("$lclpath/$flagfile"); my $run = grep {modtime($_) > $timestamp} glob $lclfiles; if ($run) { system "rsync $r_opts $lclfiles $rmthost:$rmtpath"; open FLAG, ">$lclpath/$flagfile" or die; close FLAG or die; } sleep 60; }

Re: scp cronjob
by Anonymous Monk on Sep 14, 2002 at 22:44 UTC
    thanks guys, your comments have been much appreciated. I am going to re-write the script (to help my perl learning) and look into implementing rsynch instead.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://196915]
Approved by Aristotle
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2024-04-19 09:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found