Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Re^2: searching polygons not merged

by LanX (Cardinal)
on Oct 27, 2018 at 21:23 UTC ( #1224778=note: print w/replies, xml ) Need Help??

in reply to Re: searching polygons not merged
in thread searching polygons not merged

How do you represent polygons in a database, such that searching for overlaps becomes "fast"?

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Replies are listed 'Best First'.
Re^3: searching polygons not merged
by erix (Parson) on Oct 27, 2018 at 22:33 UTC

    To illustrate polygon (a standard postgresql data type), polygon-comparison (here: overlap), and polygon-indexing (with which my practical experience is pretty much zero - caveat emptor!), I lifted some sql from the standard regression tests in the postgres source tree ( src/test/regress/sql/polygon.sql ), and messed about with it a bit, and added comments:

    Note: the postgres polygon overlap operator is &&

    #!/bin/bash echo " drop table if exists quad_poly_tbl ; create table quad_poly_tbl (id int, p polygon); insert into quad_poly_tbl select (x - 1) * 100 + y, polygon(circle(point(x * 10, y * 10), 1 + +(x + y) % 10)) from generate_series(1, 100) x, generate_series(1, 100) y ; insert into quad_poly_tbl select i, polygon '((200, 300),(210, 310),(230, 290))' from generate_series(10001, 11000) AS i ; analyze quad_poly_tbl; -- search for overlap with this polygon: select * from quad_poly_tbl where p && '((22,640),(23.0717967697245,64 +4),(26,646.928203230275),(30,648),(34,646.928203230275),(36.928203230 +2755,644),(38,640))'::polygon ; --> Time: 1.382 ms -- Seq Scan on quad_poly_tbl (3 MB) -- now add index: create index quad_poly_tbl_idx ON quad_poly_tbl USING spgist(p); -- search again for overlap with this polygon but now WITH the polygon +-index present: select * from quad_poly_tbl where p && '((22,640),(23.0717967697245,64 +4),(26,646.928203230275),(30,648),(34,646.928203230275),(36.928203230 +2755,644),(38,640))'::polygon ; --> Time: 0.271 ms -- Bitmap Index Scan on quad_poly_tbl_idx (1 MB) " | psql -qa

    So the difference between seqscan and polygon-index in this test (searching for 6 matching rows in a table of 11000 rows) is:

    --> Time: 1.382 ms -- Seq Scan on quad_poly_tbl (3 MB) --> Time: 0.271 ms -- Bitmap Index Scan on quad_poly_tbl_idx (1 MB)

    For the OP's question, of course, loading data into the database, etc., should be taken into account.

        You link to the documentation of an old postgres version (9.2) that is not supported anymore (I know that google often puts these old links to the documentation at the top -- annoying).

        But in this case there is an important distinction: recent postgres has default implementations of sp-gist. So the 'limitations' when implementing your own sp-gist class, while still true, need not worry us here. You will see that newer docs have a page "Built-in Operator Classes". My example uses the built-in sp-gist class. The 'limitations' are just warnings for those who implement their own sp-gist class.

        It would be interesting to have an example dataset to compare the performance of any forthcoming pure-perl solution with what postgres indexing can do. Or graphic libraries (like you mentioned earlier).

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1224778]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (8)
As of 2020-09-29 14:39 GMT
Find Nodes?
    Voting Booth?
    If at first I donít succeed, I Ö

    Results (146 votes). Check out past polls.