Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Geo::ShapeFile memory problem

by swl (Parson)
on Apr 17, 2017 at 05:00 UTC ( [id://1188081]=note: print w/replies, xml ) Need Help??


in reply to Geo::ShapeFile memory problem

I've just uploaded a new dev version of Geo::ShapeFile to CPAN which allows users to turn caching off.

https://metacpan.org/release/SLAFFAN/Geo-ShapeFile-2.63_001

Once that's installed, change the object creation line to be:

$shapefile = Geo::ShapeFile->new ("tabblock2010_42_pophu", {no_cache => 1});

It's an all-or-nothing approach, but should be sufficient for this use case. I might in future try to add a maximum cache size option.

Other ideas and pull requests welcome.

Shawn

Replies are listed 'Best First'.
Re^2: Geo::ShapeFile memory problem
by swl (Parson) on Apr 23, 2017 at 00:36 UTC
Re^2: Geo::ShapeFile memory problem
by huck (Prior) on Apr 18, 2017 at 02:23 UTC

    Being able to read right from one of these zip files without needing to unzip it to disk first would be a treat, but i realize it maybe a little too much to ask for.

    In that kind of mode requiring me to read it sequentially would be no problem, i'd expect that kind of processing to be kinda common ie

    my $dir0='h:/active/tiger_data'; my $sfips='42'; # pa # make $base->{$blockid10}{color} via # zipbyline reads a zip member sequentially # as in http://search.cpan.org/~phred/Archive-Zip-1.59/lib/Archive/Z +ip.pm#Low-level_member_data_reading # my $sn=$fips2state->{$sfips}.'2010.sf1'; # my $zf=$dir0.'/sf1/'.$sn.'.zip'; # my $mf=$fips2state->{$fips}.'geo2010.sf1'; ; # my $member=zipbyline_start($zf,$mf); # while (my $line=zipbyline_read($member)){ # ... pull out datums AREALAND AREAWATR POP100, create density # } # line # zipbyline_close($member); # sort by density, total POP100, # break into deciles, # assign a decile color to each $base->{$blockid10}{color} my $dir=$dir0.'/shapes'; my $state='tabblock2010_'.$sfips.'_pophu'; my $shapefn=$dir.'/'.$state.'/'.$state; my $imgfn=$dir0.'/gifs/'.$state.'.gif'; # this points to the unzipped dir now, # be nice to just point to $dir.'/'.$state.'.zip' instead my $sf = Geo::ShapeFile->new ($shapefn); $sf->caching(shp => 0); $sf->caching(dbf => 0); $sf->caching(shx => 0); $sf->caching(shapes_in_area => 0); my $x_min=$sf->x_min(); my $x_max=$sf->x_max(); my $y_max=90-$sf->y_min(); # need to invert 90 is top 0 is bot my $y_min=90-$sf->y_max(); my $totalblocks = $sf->shapes(); # $totalblocks=5000; my $xsize=$x_max-$x_min; my $ysize=$y_max-$y_min; my $imgy=5000; my $yscale=$imgy/$ysize; my $pfx=-0.00923452628555483*($sf->y_min)+ 1.15467278754118; # proje +ction factor my $imgx=$yscale*$xsize*$pfx; my $xscale=$imgx/($xsize); sub xproj { return (($_[0])-($x_min))*$xscale; } sub yproj { return ((90-$_[0])-$y_min)*$yscale; } # create a new image my $im = new GD::Image($imgx+1,$imgy+1); for my $si (1 .. $totalblocks) { my %attr = $sf->get_dbf_record($si); my $blockid10 = $attr{BLOCKID10}; my $color=$base->{$blockid10}{color}; unless ($color) {$color=$yellow;} my $polygon = $sf->get_shp_record($si); for my $pi (1 .. $polygon->num_parts) { my $part = $polygon->get_part($pi); my $poly = new GD::Polygon; for my $hash (@$part){ $poly->addPt(xproj($hash->{X}),yproj($hash->{Y})); } my $first=$part->[0]; my $last =$part->[-1]; if ($first->{X} ne $last->{X} || $first->{Y} ne $last->{Y} ) { + $poly->addPt(xproj($first->{X}),yproj($first->{Y})); } $im->filledPolygon($poly,$color); } # pi } # si outlines (); open (my $img,'>',$imgfn); binmode $img;print $img $im->gif;close $i +mg; exit; sub outlines { my $state='tl_2010_'.$sfips.'_county10'; my $shapefn=$dir.'/'.$state.'/'.$state; # this too points at an unzipped dir, # be nice to point at $dir.'/'.$state.'.zip' instead my $sf = Geo::ShapeFile->new ($shapefn); $sf->caching(shp => 0); $sf->caching(dbf => 0); $sf->caching(shx => 0); $sf->caching(shapes_in_area => 0); my $totalblocks = $sf->shapes(); for my $si (1 .. $totalblocks) { my $polygon = $sf->get_shp_record($si); for my $pi (1 .. $polygon->num_parts) { my $part = $polygon->get_part($pi); my $poly = new GD::Polygon; for my $hash (@$part){ $poly->addPt(xproj($hash->{X}),yproj($hash->{Y})); } my $first=$part->[0]; my $last =$part->[-1]; if ($first->{X} ne $last->{X} || $first->{Y} ne $last->{Y} ) { + $poly->addPt(xproj($first->{X}),yproj($first->{Y})); } $im->openPolygon($poly,$black); } # pi } # si }
    PA output at this sendspace link while it lasts
    MO output at this sendspace link while it lasts

      Thanks Huck,

      Direct extraction from a zip file is potentially useful, but in my experience not a common use case. (Although perhaps that's because there are no tools to do so...).

      Maybe there is a module that supports reading from archives as a file handle? Archive::Zip has readFromFileHandle() but that would need special handling in Geo::Shapefile each time data are accessed from file.

      Also, one thing to watch for in any plotting code is holes in the polygons. I don't think the Tiger data have holes, but in the shapefile spec they are implied by vertex order instead of being explicitly flagged. https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf

      Shawn.

        Also, one thing to watch for in any plotting code is holes in the polygons. I don't think the Tiger data have holes

        Ooooo they do.. found that pdf about 9 hours before you mentioned it. Im dealing with them now, at a huge cost, maybe more later.

        Ill live with unzipped dirs for now, and look closer into "seek"ing on a zip file and into the format of a raw shp file. to see if my sequential reader would work for sequential access.

        My cheap zip sequencer

        package cheap::zipbyline; use strict; use warnings; use Exporter; use Archive::Zip qw( :ERROR_CODES :CONSTANTS ); our @ISA= qw( Exporter ); # these CAN be exported. our @EXPORT_OK = qw( zipbyline_start zipbyline_read zipbyline_close ); # these are exported by default. our @EXPORT = qw( ); my %zbl; sub zipbyline_start { my $zf=shift; my $mf=shift; my $zip = Archive::Zip->new(); unless ( $zip->read( $zf ) == AZ_OK ) { die 'read error';} my ( $member, $status, $bufferRef ); $member = $zip->memberNamed( $mf ); $member->desiredCompressionMethod( COMPRESSION_STORED ); $status = $member->rewindData(); die "error $status" unless $status == AZ_OK; $zbl{$member}=''; return $member; } # zbl start sub zipbyline_read { my $member=shift; my ( $status, $bufferRef ); my $nl=index($zbl{$member},"\n"); while ( ( $nl == -1) && ! $member->readIsDone() ) { ( $bufferRef, $status ) = $member->readChunk(1000); die "error $status" if $status != AZ_OK && $status != AZ_STREAM_END; # do something with $bufferRef: $zbl{$member}.=$$bufferRef; $nl=index($zbl{$member},"\n"); } # while if ($nl == -1 ) {$zbl{$member}=undef; return $zbl{$member};} my $line=substr($zbl{$member},0,$nl+1); $zbl{$member}=substr($zbl{$member},$nl+1); return $line; } # zbl sub zipbyline_close { my $member=shift; delete $zbl{$member}; } # zbl close
        This will probably get improvements so i can read 2 files out of the same zip at the same time without doing two my $zip=new ... $zip->read($zf) sets, havent needed it yet.

        Maybe there is a module that supports reading from archives as a file handle?

        I believe the core module IO::Uncompress::Unzip does this, and its objects can be used like filehandles, so I think it'd be fairly transparent.

Re^2: Geo::ShapeFile memory problem
by rhzhang (Initiate) on Apr 18, 2017 at 01:03 UTC
    Great. Thank you.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1188081]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (7)
As of 2024-04-23 18:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found