http://qs321.pair.com?node_id=1224778


in reply to Re: searching polygons not merged
in thread searching polygons not merged

How do you represent polygons in a database, such that searching for overlaps becomes "fast"?

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Replies are listed 'Best First'.
Re^3: searching polygons not merged
by erix (Prior) on Oct 27, 2018 at 22:33 UTC

    To illustrate polygon (a standard postgresql data type), polygon-comparison (here: overlap), and polygon-indexing (with which my practical experience is pretty much zero - caveat emptor!), I lifted some sql from the standard regression tests in the postgres source tree ( src/test/regress/sql/polygon.sql ), and messed about with it a bit, and added comments:

    Note: the postgres polygon overlap operator is &&

    #!/bin/bash echo " drop table if exists quad_poly_tbl ; create table quad_poly_tbl (id int, p polygon); insert into quad_poly_tbl select (x - 1) * 100 + y, polygon(circle(point(x * 10, y * 10), 1 + +(x + y) % 10)) from generate_series(1, 100) x, generate_series(1, 100) y ; insert into quad_poly_tbl select i, polygon '((200, 300),(210, 310),(230, 290))' from generate_series(10001, 11000) AS i ; analyze quad_poly_tbl; -- search for overlap with this polygon: select * from quad_poly_tbl where p && '((22,640),(23.0717967697245,64 +4),(26,646.928203230275),(30,648),(34,646.928203230275),(36.928203230 +2755,644),(38,640))'::polygon ; --> Time: 1.382 ms -- Seq Scan on quad_poly_tbl (3 MB) -- now add index: create index quad_poly_tbl_idx ON quad_poly_tbl USING spgist(p); -- search again for overlap with this polygon but now WITH the polygon +-index present: select * from quad_poly_tbl where p && '((22,640),(23.0717967697245,64 +4),(26,646.928203230275),(30,648),(34,646.928203230275),(36.928203230 +2755,644),(38,640))'::polygon ; --> Time: 0.271 ms -- Bitmap Index Scan on quad_poly_tbl_idx (1 MB) " | psql -qa

    So the difference between seqscan and polygon-index in this test (searching for 6 matching rows in a table of 11000 rows) is:

    --> Time: 1.382 ms -- Seq Scan on quad_poly_tbl (3 MB) --> Time: 0.271 ms -- Bitmap Index Scan on quad_poly_tbl_idx (1 MB)

    For the OP's question, of course, loading data into the database, etc., should be taken into account.

        You link to the documentation of an old postgres version (9.2) that is not supported anymore (I know that google often puts these old links to the documentation at the top -- annoying).

        But in this case there is an important distinction: recent postgres has default implementations of sp-gist. So the 'limitations' when implementing your own sp-gist class, while still true, need not worry us here. You will see that newer docs have a page "Built-in Operator Classes". My example uses the built-in sp-gist class. The 'limitations' are just warnings for those who implement their own sp-gist class.

        It would be interesting to have an example dataset to compare the performance of any forthcoming pure-perl solution with what postgres indexing can do. Or graphic libraries (like you mentioned earlier).