Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^3: Automating Backup of a Google Map

by huck (Prior)
on Jun 18, 2017 at 23:29 UTC ( #1193068=note: print w/replies, xml ) Need Help??


in reply to Re^2: Automating Backup of a Google Map
in thread Automating Backup of a Google Map

There is a problem with storing too many files in a single directory as the directory itself gets fragmented and performance starts to suffer. There are indications that this starts at lower counts but i start to see it when directories have more than about 1000 files. If you were to store one file every hour this would start somewhere around 40 days in.

a second problem with your method is is that "duplicate" files are stored. There realy is no need to save the newest file if it is the same as the last one.

To solve the first problem i create a folder tree to store the files in. such as /users/NAME/documents/FOLDER/y2017/m06 or /users/NAME/documents/FOLDER/y2017/m06/d18 (i have a program that may store a new file every 3 min, this can mean 480 files a day)

To solve the second problem i tend to compare the last and newest files and not put the new file into the history tree if they are the same. In your case the files inside the zip get a new datetime every pull, so you have to extract the relivant 'doc.kml' files and compare those.

The following program should do both of these tasks

use strict; use warnings; use Getopt::Long qw/GetOptions/; use LWP; use File::Basename qw/dirname basename/; use File::Copy qw/copy/; my $url ='http://www.google.com/maps/d/u/1/kml?mid=1Oa_e +gVdStSJBF5C7mpS6MXrkces'; #my $topdir ='/users/NAME/documents/FOLDER'; my $dir ='D:/goodies/pdhuck/down1/perl/monks/kmlbackup'; + my $ftype ='zip'; my $debug =0; my %optdef=("debug=i" => \$debug ,"url=s" => \$url ,"dir=i" => \$dir ); GetOptions ( %optdef ) or die("Error in command line arguments\n"); die $dir.' must exist' unless (-d $dir); my $lastdir=$dir.'/last'; # unless (-d $lastdir && -w $lastdir) unless (-d $lastdir) { mustdir ($lastdir); } my $now=time; my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =gmtime($no +w) ; my $nicefmt='/y%04u/m%02u/d%02u-%02u-%02u-%02u-Z'; # last dir is mo +nth # my $nicefmt='/y%04u/m%02u/d%02u/%02u-%02u-%02u-Z'; # last dir is d +ay my $nice=sprintf($nicefmt,$year+1900,$mon+1,$mday,$hour,$min,$sec); my $lastfn=$lastdir.'/lastbackup.'.$ftype; my $nextfn=$lastdir.'/nextbackup.'.$ftype; my $ua=LWP::UserAgent->new(agent =>"libwww-perl-kmzbackup"); my $req = new HTTP::Request (GET => $url); my $request = $ua->request ($req); unless ($request->is_success) { die 'get failed for '.$url.' '.$requ +est->status_line;} open (my $nextout,'>',$nextfn) or die 'cant open '.$nextfn; binmode $nextout; print $nextout $request->decoded_content; close $nextout; my $aresame=1; my $compfile='doc.kml'; if (-f $lastfn) { use IO::Uncompress::Unzip qw(unzip $UnzipError) ; use IO::File; my $nextmember = new IO::Uncompress::Unzip($nextfn, Name => $com +pfile) or die "IO::Uncompress::unzip failed: $UnzipError\n"; my $lastmember = new IO::Uncompress::Unzip($lastfn, Name => $com +pfile) or die "IO::Uncompress::unzip failed: $UnzipError\n"; while ($aresame && ( my $nextline=<$nextmember>) && (my $lastli +ne=<$lastmember>) ){ unless ($nextline eq $lastline ) {$aresame=0} } if ($aresame && (my $nextline=<$nextmember>) ){ $aresame=0} if ($aresame && (my $lastline=<$lastmember>) ){ $aresame=0} close($nextmember); close($lastmember); } else { $aresame=0;} if ($aresame) { print "No change to file $compfile \n"; unlink $nextfn; exit; } my $endfn=$dir.$nice.'.'.$ftype; my $endfn0=basename($endfn); my $enddir=dirname($endfn); unless (-d $enddir) { mustdir($enddir); } copy($nextfn,$endfn) or die "Copy failed: $!"; print 'new backup:'.$endfn."\n"; copy($nextfn,$lastfn) or die "Copy failed: $!"; unlink $nextfn; exit; sub mustdir { my $dir=shift; return if (-d $dir); my $updir=dirname($dir); mustdir ($updir); mkdir $dir; unless (-d $dir) {die 'cant make dir:'.$dir; } } # mustdir

Replies are listed 'Best First'.
Re^4: Automating Backup of a Google Map
by afoken (Canon) on Jun 19, 2017 at 19:30 UTC
    There is a problem with storing too many files in a single directory as the directory itself gets fragmented and performance starts to suffer. There are indications that this starts at lower counts but i start to see it when directories have more than about 1000 files. If you were to store one file every hour this would start somewhere around 40 days in.

    Of course, the performance of directories full of files highly depends on the filesystem used. Caching (i.e. free RAM) and tuning of filesystem and operating system can influence the performance drastically, as can disk performance (compare an ancient harddisk with a high end SSD).

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re^4: Automating Backup of a Google Map
by jmlynesjr (Deacon) on Jun 19, 2017 at 16:34 UTC

    Huck

    Thank you for sharing this concern and Utility. I will pass it on to Doug.

    James

    There's never enough time to do it right, but always enough time to do it over...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1193068]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2022-05-28 03:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (98 votes). Check out past polls.

    Notices?