Spatial query on large table with multiple self joins performing slow

后端 未结 4 1621
刺人心
刺人心 2021-01-13 15:38

I am working on queries on a large table in Postgres 9.3.9. It is a spatial dataset and it is spatially indexed. Say, I have need to find 3 types of objects: A, B and C. The

相关标签:
4条回答
  • 2021-01-13 16:01

    This query should go a long way (be much faster):

    WITH school AS (
       SELECT s.osm_id AS school_id, text 'school' AS type, s.osm_id, s.name, s.way_geo
       FROM   planet_osm_point s
            , LATERAL (
          SELECT  1 FROM planet_osm_point
          WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
          AND     amenity = 'bar'
          LIMIT   1  -- bar exists -- most selective first if possible
          ) b
            , LATERAL (
          SELECT  1 FROM planet_osm_point
          WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
          AND     amenity = 'restaurant'
          LIMIT   1  -- restaurant exists
          ) r
       WHERE  s.amenity = 'school'
       )
    SELECT * FROM (
       TABLE school  -- schools
    
       UNION ALL  -- bars
       SELECT s.school_id, 'bar', x.*
       FROM   school s
            , LATERAL (
          SELECT  osm_id, name, way_geo
          FROM    planet_osm_point
          WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
          AND     amenity = 'bar'
          ) x
    
       UNION ALL  -- restaurants
       SELECT s.school_id, 'rest.', x.*
       FROM   school s
            , LATERAL (
          SELECT  osm_id, name, way_geo
          FROM    planet_osm_point
          WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
          AND     amenity = 'restaurant'
          ) x
       ) sub
    ORDER BY school_id, (type <> 'school'), type, osm_id;
    

    This is not the same as your original query, but rather what you actually want, as per discussion in comments:

    I want a list of schools that have restaurants and bars within 500 meters and I need the coordinates of each school and its corresponding restaurants and bars.

    So this query returns a list of those schools, followed by bars and restaurants nearby. Each set of rows is held together by the osm_id of the school in the column school_id.

    Now using LATERAL joins, to make use of the spatial GiST index.

    TABLE school is just shorthand for SELECT * FROM school:

    • Is there a shortcut for SELECT * FROM in psql?

    The expression (type <> 'school') orders the school in each set first, because:

    • SQL select query order by day and month

    The subquery sub in the final SELECT is only needed to order by this expression. A UNION query limits an attached ORDER BY list to only columns, no expressions.

    I focus on the query you presented for the purpose of this answer - ignoring the extended requirement to filter on any of the other 70 text columns. That's really a design flaw. The search criteria should be concentrated in few columns. Or you'll have to index all 70 columns, and multicolumn indexes like I am going to propose are hardly an option. Still possible though ...

    Index

    In addition to the existing:

    "idx_planet_osm_point_waygeo" gist (way_geo)
    

    If always filtering on the same column, you could create a multicolumn index covering the few columns you are interested in, so index-only scans become possible:

    CREATE INDEX planet_osm_point_bar_idx ON planet_osm_point (amenity, name, osm_id)
    

    Postgres 9.5

    The upcoming Postgres 9.5 introduces major improvements that happen to address your case exactly:

    • Allow queries to perform accurate distance filtering of bounding-box-indexed objects (polygons, circles) using GiST indexes (Alexander Korotkov, Heikki Linnakangas)

      Previously, a common table expression was required to return a large number of rows ordered by bounding-box distance, and then filtered further with a more accurate non-bounding-box distance calculation.

    • Allow GiST indexes to perform index-only scans (Anastasia Lubennikova, Heikki Linnakangas, Andreas Karlsson)

    That's of particular interest for you. Now you can have a single multicolumn (covering) GiST index:

    CREATE INDEX reservations_range_idx ON reservations
    USING gist(amenity, way_geo, name, osm_id)
    

    And:

    • Improve bitmap index scan performance (Teodor Sigaev, Tom Lane)

    And:

    • Add GROUP BY analysis functions GROUPING SETS, CUBE and ROLLUP (Andrew Gierth, Atri Sharma)

    Why? Because ROLLUP would simplify the query I suggested. Related answer:

    • Grouping() equivalent in PostgreSQL?

    The first alpha version has been released on July 2, 2015. The expected timeline for the release:

    This is the alpha release of version 9.5, indicating that some changes to features are still possible before release. The PostgreSQL Project will release 9.5 beta 1 in August, and then periodically release additional betas as required for testing until the final release in late 2015.

    Basics

    Of course, be sure not to overlook the basics:

    • Slow Query Questions page on the PostgreSQL Wiki
    0 讨论(0)
  • 2021-01-13 16:01

    Does it make any difference if you use explicit joins?

    SELECT a.id as a_id, a.name as a_name, a.geog as a_geog,
           b.id as b_id, b.name as b_name, b.geog as b_geog,
           c.id as c_id, c.name as c_name, c.geog as c_geog
    FROM table1 a
    JOIN table1 b ON b.type = 'B' AND ST_DWithin(a.geog, b.geog, 100)
    JOIN table1 c ON c.type = 'C' AND ST_DWithin(a.geog, c.geog, 100)
    WHERE a.type = 'A';
    
    0 讨论(0)
  • 2021-01-13 16:07

    The 3 sub-selects that you use are very inefficient. Write them as LEFT JOIN clauses and the query should be much more efficient:

    SELECT
      school.osm_id AS school_osm_id, 
      school.name AS school_name, 
      school.way AS school_way, 
      restaurant.osm_id AS restaurant_osm_id, 
      restaurant.name AS restaurant_name, 
      restaurant.way AS restaurant_way, 
      bar.osm_id AS bar_osm_id, 
      bar.name AS bar_name, 
      bar.way AS bar_way 
    FROM planet_osm_point school
    LEFT JOIN planet_osm_point restaurant ON restaurant.amenity = 'restaurant' AND
                                   ST_DWithin(school.way_geo, restaurant.way_geo, 500, false) 
    LEFT JOIN planet_osm_point bar ON bar.amenity = 'bar' AND
                                   ST_DWithin(school.way_geo, bar.way_geo, 500, false)
    WHERE school.amenity = 'school'
      AND (restaurant.osm_id IS NOT NULL OR bar.osm_id IS NOT NULL);

    But this will give too many results if you have multiple restaurants and bars per school. You can simplify the query like this:

    SELECT
      school.osm_id AS school_osm_id, 
      school.name AS school_name, 
      school.way AS school_way, 
      a.osm_id AS amenity_osm_id, 
      a.amenity AS amenity_type,
      a.name AS amenity_name, 
      a.way AS amenity_way, 
    FROM planet_osm_point school
    JOIN planet_osm_point a ON ST_DWithin(school.way_geo, a.way_geo, 500, false) 
    WHERE school.amenity = 'school'
      AND a.amenity IN ('bar', 'restaurant');

    This will give every bar and restaurant for each school. Schools without either restaurant or bar within 500m are not listed.

    0 讨论(0)
  • 2021-01-13 16:19

    Try this with inner join syntax and compare the results, there should be no duplicates. My guess is it should take 1/3rd the time or better than the original query :

    select a.id as a_id, a.name as a_name, a.geog as a_geo,
           b.id as b_id, b.name as b_name, b.geog as b_geo,
           c.id as c_id, c.name as c_name, c.geog as c_geo
    from table1 as a
    INNER JOIN table1 as b on b.type='B'
    INNER JOIN table1 as c on c.type='C'
    WHERE a.type='A' and
         (ST_DWithin(a.geo, b.geo, 100) and ST_DWithin(a.geo, c.geo, 100))
    
    0 讨论(0)
提交回复
热议问题