Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Re^9: Geo Package files

by swl (Vicar)
on Mar 07, 2022 at 02:39 UTC ( #11141880=note: print w/replies, xml ) Need Help??

in reply to Re^8: Geo Package files
in thread Geo Package files

Can you clarify what looks wrong?

It appears you have decoded the header information correctly, or at least it makes sense. What remains is to decode the component linestrings, of which there are 34 in this multipart geometry.

A useful debugging process would be to cross-check using a GIS package like QGIS. This will allow you to select and plot each feature and see if your numbers are what are expected.

Replies are listed 'Best First'.
Solved! (was: Re^10: Geo Package files)
by Bod (Priest) on Mar 11, 2022 at 22:39 UTC
    This will allow you to select and plot each feature and see if your numbers are what are expected

    The problem was not with unpacking the data - it was with my understanding!!!

    I had erroneously thought that the envelope would be the bounds of the UK. Whereas, it is only the bounds for a single route! The decimal places in the coordinates still confuse me but rounding them gives sufficiently high accuracy.

    My test code looks like this:

    use DBD::SQLite; use Data::Dumper; use strict; use warnings; my $dbh = DBI->connect("dbi:SQLite:uri=file:osopenusrn_202203.gpkg?mod +e=rwc"); my $tab = $dbh->prepare("SELECT * FROM openUSRN LIMIT 1"); $tab->execute; my $data = $tab->fetchrow_hashref; my $geo = $data->{'geometry'}; my @test = unpack "(H2)2 CCVd6CVVCVV(d<)6CVV(d<)9", $data->{'geometry' +}; print "\n\nUSRN\t\t" . $data->{'usrn'} . "\n"; print "\nMagic bytes\t$test[0] $test[1]\n"; print "version\t\t$test[2]\n"; print "flags\t\t"; printf("%08b\n", $test[3]); print "SRS ID\t\t$test[4]\n\n"; print "minx\t\t$test[5]\n"; print "maxx\t\t$test[6]\n"; print "miny\t\t$test[7]\n"; print "maxy\t\t$test[8]\n"; print "minz\t\t$test[9]\n"; print "maxz\t\t$test[10]\n"; print "-----\n"; print "NDR\t\t$test[11]\n"; print "type\t\t$test[12]\n"; print "number\t\t$test[13]\n\n"; print "byteOrder\t$test[14]\n"; print "type\t\t$test[15]\n"; print "number\t\t$test[16]\n\n"; #print "byteOrder\t$test[17]\n"; #print "type\t\t$test[18]\n"; print "Point1 x\t$test[17]\n"; print "Point1 y\t$test[18]\n"; print "Point1 z\t$test[19]\n"; print "Point2 x\t$test[20]\n"; print "Point2 y\t$test[21]\n"; print "Point2 z\t$test[22]\n\n"; print "byteOrder\t$test[23]\n"; print "type\t\t$test[24]\n"; print "number\t\t$test[25]\n\n"; print "Point1 x\t$test[26]\n"; print "Point1 y\t$test[27]\n"; print "Point1 z\t$test[28]\n"; print "Point2 x\t$test[29]\n"; print "Point2 y\t$test[30]\n"; print "Point2 z\t$test[31]\n"; print "Point3 x\t$test[32]\n"; print "Point3 y\t$test[33]\n"; print "Point3 z\t$test[34]\n";
    This shows clearly what is happening and has allowed me to pick the data structure apart a bit at a time. The output I get is:
    USRN 4601764 Magic bytes 47 50 version 0 flags 00000101 SRS ID 27700 minx 637590.385 maxx 642426.601 miny 309577.58 maxy 310361.391000001 minz 0 maxz 0 ----- NDR 1 type 1005 number 34 byteOrder 1 type 1002 number 2 Point1 x 640921.438 Point1 y 310210.692 Point1 z 0 Point2 x 641634.2632 Point2 y 309910.6438 Point2 z 0 byteOrder 1 type 1002 number 3 Point1 x 637969.3747 Point1 y 309910.6438 Point1 z 0 Point2 x 638226.29 Point2 y 309978.609999999 Point2 z 0 Point3 x 638404.534 Point3 y 310054.294 Point3 z 0

    I've sanity checked the data against online tools. First I looked up the USRN of the first entry in the database. I then plotted the 5 points I got from the first 2 of the 34 LineStrings. All 5 points are on the route shown by the USRN so I take that a success :)

    Of course, I still need to generalise the reading of the different geometry data types but at least their structure makes some sense.

    The first step is to get the midpoint of each USRN. Just taking the centre of the envelope should provide this and won't require too much delving into the geometries - although that will have to follow in time.

    Lots of thanks for your excellent pointers.

      Good to hear it works now. It would be useful to have a generalised WKB reader (and writer) on CPAN.

      WRT the midpoints - the centroid will work as a start. However, if your mid-point must correspond with a street address on the road then it is not too hard to find the mid-point along a series of vertices.

      One approach is to convert the array of vertices to cumulative distances and then use a binary search to find the distance that is half the maximum (which is the last cumulative distance array element). It won't make much difference with small numbers of geometries but if you have many geometries with many vertices then it adds up as the average case will be of the order of (n+log(n)) calculations compared with 1.5n for a two pass search that exits when the half way point is found. (For 100,000 vertices this is ~100,011 compared with 150,000). There are binary search implementations on CPAN that will avoid the problems of implementing your own, two I have used are in List::MoreUtils and List::BinarySearch.

      Edit - updated the n+log(n) value to use the natural log, and a typo.

        It would be useful to have a generalised WKB reader (and writer) on CPAN.

        As I write this code, I shall bear in mind that a WKB decoder would be a useful addition to CPAN. However, WKB is much more extensive than the limited scope I am looking at so whatever I produce will be a limited version that will be useful to some people. Either to use as-is or to build upon.

        The first requirement I have from this USRN data is to find the nearest USRN to a given point. For that, the centre of the envelope is sufficient. In future, I will be wanting to plot the LineStrings so proper decoding will be needed but that's for another day.

Re^10: Geo Package files
by Bod (Priest) on Mar 11, 2022 at 20:27 UTC
    A useful debugging process would be to cross-check using a GIS package...

    Yes - thanks, that is probably the next step. I was planning to check the data against a USRN finder but your suggestion is perhaps better.

    I'm not sure this will help with the envelope so I might need to press on and decode some of the geometry data before I can property sanity check it.

      I've added a few suggestions in 11142012.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11141880]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (8)
As of 2023-02-02 14:21 GMT
Find Nodes?
    Voting Booth?
    I prefer not to run the latest version of Perl because:

    Results (19 votes). Check out past polls.