Spatial query on large table with multiple self joins performing slow

后端 未结 4 1620
刺人心
刺人心 2021-01-13 15:38

I am working on queries on a large table in Postgres 9.3.9. It is a spatial dataset and it is spatially indexed. Say, I have need to find 3 types of objects: A, B and C. The

4条回答
  •  一生所求
    2021-01-13 16:01

    This query should go a long way (be much faster):

    WITH school AS (
       SELECT s.osm_id AS school_id, text 'school' AS type, s.osm_id, s.name, s.way_geo
       FROM   planet_osm_point s
            , LATERAL (
          SELECT  1 FROM planet_osm_point
          WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
          AND     amenity = 'bar'
          LIMIT   1  -- bar exists -- most selective first if possible
          ) b
            , LATERAL (
          SELECT  1 FROM planet_osm_point
          WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
          AND     amenity = 'restaurant'
          LIMIT   1  -- restaurant exists
          ) r
       WHERE  s.amenity = 'school'
       )
    SELECT * FROM (
       TABLE school  -- schools
    
       UNION ALL  -- bars
       SELECT s.school_id, 'bar', x.*
       FROM   school s
            , LATERAL (
          SELECT  osm_id, name, way_geo
          FROM    planet_osm_point
          WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
          AND     amenity = 'bar'
          ) x
    
       UNION ALL  -- restaurants
       SELECT s.school_id, 'rest.', x.*
       FROM   school s
            , LATERAL (
          SELECT  osm_id, name, way_geo
          FROM    planet_osm_point
          WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
          AND     amenity = 'restaurant'
          ) x
       ) sub
    ORDER BY school_id, (type <> 'school'), type, osm_id;
    

    This is not the same as your original query, but rather what you actually want, as per discussion in comments:

    I want a list of schools that have restaurants and bars within 500 meters and I need the coordinates of each school and its corresponding restaurants and bars.

    So this query returns a list of those schools, followed by bars and restaurants nearby. Each set of rows is held together by the osm_id of the school in the column school_id.

    Now using LATERAL joins, to make use of the spatial GiST index.

    TABLE school is just shorthand for SELECT * FROM school:

    • Is there a shortcut for SELECT * FROM in psql?

    The expression (type <> 'school') orders the school in each set first, because:

    • SQL select query order by day and month

    The subquery sub in the final SELECT is only needed to order by this expression. A UNION query limits an attached ORDER BY list to only columns, no expressions.

    I focus on the query you presented for the purpose of this answer - ignoring the extended requirement to filter on any of the other 70 text columns. That's really a design flaw. The search criteria should be concentrated in few columns. Or you'll have to index all 70 columns, and multicolumn indexes like I am going to propose are hardly an option. Still possible though ...

    Index

    In addition to the existing:

    "idx_planet_osm_point_waygeo" gist (way_geo)
    

    If always filtering on the same column, you could create a multicolumn index covering the few columns you are interested in, so index-only scans become possible:

    CREATE INDEX planet_osm_point_bar_idx ON planet_osm_point (amenity, name, osm_id)
    

    Postgres 9.5

    The upcoming Postgres 9.5 introduces major improvements that happen to address your case exactly:

    • Allow queries to perform accurate distance filtering of bounding-box-indexed objects (polygons, circles) using GiST indexes (Alexander Korotkov, Heikki Linnakangas)

      Previously, a common table expression was required to return a large number of rows ordered by bounding-box distance, and then filtered further with a more accurate non-bounding-box distance calculation.

    • Allow GiST indexes to perform index-only scans (Anastasia Lubennikova, Heikki Linnakangas, Andreas Karlsson)

    That's of particular interest for you. Now you can have a single multicolumn (covering) GiST index:

    CREATE INDEX reservations_range_idx ON reservations
    USING gist(amenity, way_geo, name, osm_id)
    

    And:

    • Improve bitmap index scan performance (Teodor Sigaev, Tom Lane)

    And:

    • Add GROUP BY analysis functions GROUPING SETS, CUBE and ROLLUP (Andrew Gierth, Atri Sharma)

    Why? Because ROLLUP would simplify the query I suggested. Related answer:

    • Grouping() equivalent in PostgreSQL?

    The first alpha version has been released on July 2, 2015. The expected timeline for the release:

    This is the alpha release of version 9.5, indicating that some changes to features are still possible before release. The PostgreSQL Project will release 9.5 beta 1 in August, and then periodically release additional betas as required for testing until the final release in late 2015.

    Basics

    Of course, be sure not to overlook the basics:

    • Slow Query Questions page on the PostgreSQL Wiki

提交回复
热议问题