Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Solved! (was: Re^10: Geo Package files)

by Bod (Parson)
on Mar 11, 2022 at 22:39 UTC ( [id://11142014]=note: print w/replies, xml ) Need Help??


in reply to Re^9: Geo Package files
in thread Geo Package files

This will allow you to select and plot each feature and see if your numbers are what are expected

The problem was not with unpacking the data - it was with my understanding!!!

I had erroneously thought that the envelope would be the bounds of the UK. Whereas, it is only the bounds for a single route! The decimal places in the coordinates still confuse me but rounding them gives sufficiently high accuracy.

My test code looks like this:

use DBD::SQLite; use Data::Dumper; use strict; use warnings; my $dbh = DBI->connect("dbi:SQLite:uri=file:osopenusrn_202203.gpkg?mod +e=rwc"); my $tab = $dbh->prepare("SELECT * FROM openUSRN LIMIT 1"); $tab->execute; my $data = $tab->fetchrow_hashref; my $geo = $data->{'geometry'}; my @test = unpack "(H2)2 CCVd6CVVCVV(d<)6CVV(d<)9", $data->{'geometry' +}; print "\n\nUSRN\t\t" . $data->{'usrn'} . "\n"; print "\nMagic bytes\t$test[0] $test[1]\n"; print "version\t\t$test[2]\n"; print "flags\t\t"; printf("%08b\n", $test[3]); print "SRS ID\t\t$test[4]\n\n"; print "minx\t\t$test[5]\n"; print "maxx\t\t$test[6]\n"; print "miny\t\t$test[7]\n"; print "maxy\t\t$test[8]\n"; print "minz\t\t$test[9]\n"; print "maxz\t\t$test[10]\n"; print "-----\n"; print "NDR\t\t$test[11]\n"; print "type\t\t$test[12]\n"; print "number\t\t$test[13]\n\n"; print "byteOrder\t$test[14]\n"; print "type\t\t$test[15]\n"; print "number\t\t$test[16]\n\n"; #print "byteOrder\t$test[17]\n"; #print "type\t\t$test[18]\n"; print "Point1 x\t$test[17]\n"; print "Point1 y\t$test[18]\n"; print "Point1 z\t$test[19]\n"; print "Point2 x\t$test[20]\n"; print "Point2 y\t$test[21]\n"; print "Point2 z\t$test[22]\n\n"; print "byteOrder\t$test[23]\n"; print "type\t\t$test[24]\n"; print "number\t\t$test[25]\n\n"; print "Point1 x\t$test[26]\n"; print "Point1 y\t$test[27]\n"; print "Point1 z\t$test[28]\n"; print "Point2 x\t$test[29]\n"; print "Point2 y\t$test[30]\n"; print "Point2 z\t$test[31]\n"; print "Point3 x\t$test[32]\n"; print "Point3 y\t$test[33]\n"; print "Point3 z\t$test[34]\n";
This shows clearly what is happening and has allowed me to pick the data structure apart a bit at a time. The output I get is:
USRN 4601764 Magic bytes 47 50 version 0 flags 00000101 SRS ID 27700 minx 637590.385 maxx 642426.601 miny 309577.58 maxy 310361.391000001 minz 0 maxz 0 ----- NDR 1 type 1005 number 34 byteOrder 1 type 1002 number 2 Point1 x 640921.438 Point1 y 310210.692 Point1 z 0 Point2 x 641634.2632 Point2 y 309910.6438 Point2 z 0 byteOrder 1 type 1002 number 3 Point1 x 637969.3747 Point1 y 309910.6438 Point1 z 0 Point2 x 638226.29 Point2 y 309978.609999999 Point2 z 0 Point3 x 638404.534 Point3 y 310054.294 Point3 z 0

I've sanity checked the data against online tools. First I looked up the USRN of the first entry in the database. I then plotted the 5 points I got from the first 2 of the 34 LineStrings. All 5 points are on the route shown by the USRN so I take that a success :)

Of course, I still need to generalise the reading of the different geometry data types but at least their structure makes some sense.

The first step is to get the midpoint of each USRN. Just taking the centre of the envelope should provide this and won't require too much delving into the geometries - although that will have to follow in time.

Lots of thanks for your excellent pointers.

Replies are listed 'Best First'.
Re: Solved! (was: Re^10: Geo Package files)
by swl (Parson) on Mar 11, 2022 at 23:21 UTC

    Good to hear it works now. It would be useful to have a generalised WKB reader (and writer) on CPAN.

    WRT the midpoints - the centroid will work as a start. However, if your mid-point must correspond with a street address on the road then it is not too hard to find the mid-point along a series of vertices.

    One approach is to convert the array of vertices to cumulative distances and then use a binary search to find the distance that is half the maximum (which is the last cumulative distance array element). It won't make much difference with small numbers of geometries but if you have many geometries with many vertices then it adds up as the average case will be of the order of (n+log(n)) calculations compared with 1.5n for a two pass search that exits when the half way point is found. (For 100,000 vertices this is ~100,011 compared with 150,000). There are binary search implementations on CPAN that will avoid the problems of implementing your own, two I have used are in List::MoreUtils and List::BinarySearch.

    Edit - updated the n+log(n) value to use the natural log, and a typo.

      It would be useful to have a generalised WKB reader (and writer) on CPAN.

      As I write this code, I shall bear in mind that a WKB decoder would be a useful addition to CPAN. However, WKB is much more extensive than the limited scope I am looking at so whatever I produce will be a limited version that will be useful to some people. Either to use as-is or to build upon.

      The first requirement I have from this USRN data is to find the nearest USRN to a given point. For that, the centre of the envelope is sufficient. In future, I will be wanting to plot the LineStrings so proper decoding will be needed but that's for another day.

        Even a limited version would be a very useful thing. The seven types supported by GEOS would cover more than 90% of use cases (points, polylines and polygons, their multi-part variants and geometry collections).

      It would be useful to have a generalised WKB reader (and writer) on CPAN

      This has been swirling around in my mind for a little while now...certainly happy to have a stab at creating a WKB reader - not sure about the writer as that seems to be a whole extra layer or three of complexity.

      What I cannot envision is what sort of output a WKB reader would provide. In other words, what sort of Perl data structure would it read a WKB-encoded file into?

      Would an array of GEO::Point objects be the output one might expect?

        I have not checked closely but Geo::Point seems to be for geographic coordinates, so latitude and longitude and not projected coordinates such as UTM. (Edit: actually it does but it uses proj4 which is long deprecated). It also seems not to handle Z and M coordinates, and neither does the parent Geo::Shape class.

        For a single part polygon or polyline one just needs an array of coordinates (vertices), so an array of arrays would do. Maybe each vertex could be blessed into an array based object if that makes other things easier.

        Multipart features are collections of polygons, polylines and points (one type per feature) so would be a different object class.

        Then there are geometry collections which can be a mix of feature types. e.g. http://postgis.net/workshops/postgis-intro/geometries.html

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11142014]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2024-04-16 10:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found