I am working on queries on a large table in Postgres 9.3.9. It is a spatial dataset and it is spatially indexed. Say, I have need to find 3 types of objects: A, B and C. The
This query should go a long way (be much faster):
WITH school AS (
SELECT s.osm_id AS school_id, text 'school' AS type, s.osm_id, s.name, s.way_geo
FROM planet_osm_point s
, LATERAL (
SELECT 1 FROM planet_osm_point
WHERE ST_DWithin(way_geo, s.way_geo, 500, false)
AND amenity = 'bar'
LIMIT 1 -- bar exists -- most selective first if possible
) b
, LATERAL (
SELECT 1 FROM planet_osm_point
WHERE ST_DWithin(way_geo, s.way_geo, 500, false)
AND amenity = 'restaurant'
LIMIT 1 -- restaurant exists
) r
WHERE s.amenity = 'school'
)
SELECT * FROM (
TABLE school -- schools
UNION ALL -- bars
SELECT s.school_id, 'bar', x.*
FROM school s
, LATERAL (
SELECT osm_id, name, way_geo
FROM planet_osm_point
WHERE ST_DWithin(way_geo, s.way_geo, 500, false)
AND amenity = 'bar'
) x
UNION ALL -- restaurants
SELECT s.school_id, 'rest.', x.*
FROM school s
, LATERAL (
SELECT osm_id, name, way_geo
FROM planet_osm_point
WHERE ST_DWithin(way_geo, s.way_geo, 500, false)
AND amenity = 'restaurant'
) x
) sub
ORDER BY school_id, (type <> 'school'), type, osm_id;
This is not the same as your original query, but rather what you actually want, as per discussion in comments:
I want a list of schools that have restaurants and bars within 500 meters and I need the coordinates of each school and its corresponding restaurants and bars.
So this query returns a list of those schools, followed by bars and restaurants nearby. Each set of rows is held together by the osm_id
of the school in the column school_id
.
Now using LATERAL
joins, to make use of the spatial GiST index.
TABLE school
is just shorthand for SELECT * FROM school
:
The expression (type <> 'school')
orders the school in each set first, because:
The subquery sub
in the final SELECT
is only needed to order by this expression. A UNION
query limits an attached ORDER BY
list to only columns, no expressions.
I focus on the query you presented for the purpose of this answer - ignoring the extended requirement to filter on any of the other 70 text columns. That's really a design flaw. The search criteria should be concentrated in few columns. Or you'll have to index all 70 columns, and multicolumn indexes like I am going to propose are hardly an option. Still possible though ...
In addition to the existing:
"idx_planet_osm_point_waygeo" gist (way_geo)
If always filtering on the same column, you could create a multicolumn index covering the few columns you are interested in, so index-only scans become possible:
CREATE INDEX planet_osm_point_bar_idx ON planet_osm_point (amenity, name, osm_id)
The upcoming Postgres 9.5 introduces major improvements that happen to address your case exactly:
Allow queries to perform accurate distance filtering of bounding-box-indexed objects (polygons, circles) using GiST indexes (Alexander Korotkov, Heikki Linnakangas)
Previously, a common table expression was required to return a large number of rows ordered by bounding-box distance, and then filtered further with a more accurate non-bounding-box distance calculation.
Allow GiST indexes to perform index-only scans (Anastasia Lubennikova, Heikki Linnakangas, Andreas Karlsson)
That's of particular interest for you. Now you can have a single multicolumn (covering) GiST index:
CREATE INDEX reservations_range_idx ON reservations
USING gist(amenity, way_geo, name, osm_id)
And:
- Improve bitmap index scan performance (Teodor Sigaev, Tom Lane)
And:
- Add GROUP BY analysis functions
GROUPING SETS
,CUBE
andROLLUP
(Andrew Gierth, Atri Sharma)
Why? Because ROLLUP would simplify the query I suggested. Related answer:
The first alpha version has been released on July 2, 2015. The expected timeline for the release:
This is the alpha release of version 9.5, indicating that some changes to features are still possible before release. The PostgreSQL Project will release 9.5 beta 1 in August, and then periodically release additional betas as required for testing until the final release in late 2015.
Of course, be sure not to overlook the basics:
Does it make any difference if you use explicit joins?
SELECT a.id as a_id, a.name as a_name, a.geog as a_geog,
b.id as b_id, b.name as b_name, b.geog as b_geog,
c.id as c_id, c.name as c_name, c.geog as c_geog
FROM table1 a
JOIN table1 b ON b.type = 'B' AND ST_DWithin(a.geog, b.geog, 100)
JOIN table1 c ON c.type = 'C' AND ST_DWithin(a.geog, c.geog, 100)
WHERE a.type = 'A';
The 3 sub-selects that you use are very inefficient. Write them as LEFT JOIN
clauses and the query should be much more efficient:
SELECT
school.osm_id AS school_osm_id,
school.name AS school_name,
school.way AS school_way,
restaurant.osm_id AS restaurant_osm_id,
restaurant.name AS restaurant_name,
restaurant.way AS restaurant_way,
bar.osm_id AS bar_osm_id,
bar.name AS bar_name,
bar.way AS bar_way
FROM planet_osm_point school
LEFT JOIN planet_osm_point restaurant ON restaurant.amenity = 'restaurant' AND
ST_DWithin(school.way_geo, restaurant.way_geo, 500, false)
LEFT JOIN planet_osm_point bar ON bar.amenity = 'bar' AND
ST_DWithin(school.way_geo, bar.way_geo, 500, false)
WHERE school.amenity = 'school'
AND (restaurant.osm_id IS NOT NULL OR bar.osm_id IS NOT NULL);
But this will give too many results if you have multiple restaurants and bars per school. You can simplify the query like this:
SELECT
school.osm_id AS school_osm_id,
school.name AS school_name,
school.way AS school_way,
a.osm_id AS amenity_osm_id,
a.amenity AS amenity_type,
a.name AS amenity_name,
a.way AS amenity_way,
FROM planet_osm_point school
JOIN planet_osm_point a ON ST_DWithin(school.way_geo, a.way_geo, 500, false)
WHERE school.amenity = 'school'
AND a.amenity IN ('bar', 'restaurant');
This will give every bar and restaurant for each school. Schools without either restaurant or bar within 500m are not listed.
Try this with inner join syntax and compare the results, there should be no duplicates. My guess is it should take 1/3rd the time or better than the original query :
select a.id as a_id, a.name as a_name, a.geog as a_geo,
b.id as b_id, b.name as b_name, b.geog as b_geo,
c.id as c_id, c.name as c_name, c.geog as c_geo
from table1 as a
INNER JOIN table1 as b on b.type='B'
INNER JOIN table1 as c on c.type='C'
WHERE a.type='A' and
(ST_DWithin(a.geo, b.geo, 100) and ST_DWithin(a.geo, c.geo, 100))