Reuse computed select value

前端 未结 1 994
渐次进展
渐次进展 2021-01-13 16:18

I\'m trying to use ST_SnapToGrid and then GROUP BY the grid cells (x, y). Here is what I did first:

SELECT
  COUNT(*)                      AS n,         


        
1条回答
  •  -上瘾入骨i
    2021-01-13 17:15

    Test timing

    You don't see the evaluation of individual functions per row in the EXPLAIN output.

    Test with EXPLAIN ANALYZE to get actual query times to compare overall effectiveness. Run a couple of times to rule out caching artifacts. For simple queries like this, you get more reliable numbers for the total runtime with:

    EXPLAIN (ANALYZE, TIMING OFF) SELECT ...
    

    Requires Postgres 9.2+. Per documentation:

    TIMING

    Include actual startup time and time spent in each node in the output. The overhead of repeatedly reading the system clock can slow down the query significantly on some systems, so it may be useful to set this parameter to FALSE when only actual row counts, and not exact times, are needed. Run time of the entire statement is always measured, even when node-level timing is turned off with this option. This parameter may only be used when ANALYZE is also enabled. It defaults to TRUE.

    Prevent repeated evaluation

    Normally, expressions in a subquery are evaluated once. But Postgres can collapse trivial subqueries if it thinks that will be faster.

    To introduce an optimization barrier, you could use a CTE instead of the subquery. This guarantees that Postgres computes ST_SnapToGrid(geom, 50) once only:

    WITH cte AS (
       SELECT ST_SnapToGrid(geom, 50) AS geom1
       FROM   points
       )
    SELECT COUNT(*)   AS n
         , ST_X(geom1) AS x
         , ST_Y(geom1) AS y
    FROM   cte
    GROUP  BY geom1;         -- see below
    

    However, this it's probably slower than a subquery due to more overhead for a CTE. The function call is probably very cheap. Generally, Postgres knows better how to optimize a query plan. Only introduce such an optimization barrier if you know better.

    Simplify

    I changed the name of the computed point in the subquery / CTE to geom1 to clarify it's different from the original geom. That helps to clarify the more important thing here:

    GROUP BY geom1
    

    instead of:

    GROUP BY x, y

    That's obviously cheaper - and may have an influence on whether the function call is repeated. So, this is probably fastest:

    SELECT COUNT(*) AS n
         , ST_X(ST_SnapToGrid(geom, 50)) AS x
         , ST_y(ST_SnapToGrid(geom, 50)) AS y
    FROM   points
    GROUP  BY ST_SnapToGrid(geom, 50);         -- same here!
    

    Or maybe this:

    SELECT COUNT(*)    AS n
         , ST_X(geom1) AS x
         , ST_y(geom1) AS y
    FROM (
       SELECT ST_SnapToGrid(geom, 50) AS geom1
       FROM   points
       ) AS tmp
    GROUP  BY geom1;
    

    Test all three with EXPLAIN ANALYZE or EXPLAIN (ANALYZE, TIMING OFF) and see for yourself. Testing >> guessing.

    0 讨论(0)
提交回复
热议问题