why the query is executed 76 times slower when I put it into function?

问题

When I put next query into function it goes 76times slower. The only difference at plan is: bitmap-index scan VS index scan

Plan1: http://tatiyants.com/pev/#/plans/plan_1562919134481

Plan2: http://tatiyants.com/pev/#/plans/plan_1562918860704

plan1

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON)
        SELECT
            sum( t.group_suma ) OVER( PARTITION BY (t.o).id ) AS total_suma,
            *
        FROM (
            SELECT
             sum( ocd.item_cost     ) AS group_cost,
             sum( ocd.item_suma     ) AS group_suma,
             max( (ocd.ic).consumed ) AS consumed,
             (ocd.ic).consumed_period,
             ocd.o
            FROM order_cost_details( tstzrange( '2019-04-01', '2019-05-01' ) ) ocd
            GROUP BY ocd.o, (ocd.ic).consumed_period
        ) t
WHERE (t.o).id IN ( 6154 ) AND t.consumed_period @> '2019-04-01'::timestamptz
;

Plan2

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON)
SELECT * FROM order_total_suma( tstzrange( '2019-04-01', '2019-05-01' ) ) ots 
WHERE (ots.o).id IN ( 6154 ) AND ots.consumed_period @> '2019-04-01'::timestamptz
;

The function:

CREATE FUNCTION "order_total_suma" (in _target_range tstzrange default app_period())
 RETURNS    table(
        total_suma  double precision,
        group_cost  double precision,
        group_suma  double precision,
        consumed    double precision,
        consumed_period tstzrange,
        o order_bt
    )

 LANGUAGE sql
 STABLE
 AS $$
    SELECT
        sum( t.group_suma ) OVER( PARTITION BY (t.o).id ) AS total_suma,
        *
    FROM (
        SELECT
         sum( ocd.item_cost     ) AS group_cost,
         sum( ocd.item_suma     ) AS group_suma,
         max( (ocd.ic).consumed ) AS consumed,
         (ocd.ic).consumed_period,
         ocd.o
        FROM order_cost_details( _target_range ) ocd
        GROUP BY ocd.o, (ocd.ic).consumed_period
    ) t
$$
;

Why for the query inside function the filtering is done at the last subquery scan?

Is it possible to do something so that they work equally?

UPD
Server version is PostgreSQL 12beta2
Because of 30000 characters limit I post plans here and here

回答1:

Thank to RhodiumToad from IRC:

I suspect something's stopping the planner from being able to deduce that (t.o).id is safe to push through a GROUP BY ocd.o

that might be fixable by making it a separate column of its own

Thus I additionally GROUP BY odc.id column. So my final query is:

    SELECT * FROM (
            SELECT
                sum( t.group_suma ) OVER( PARTITION BY t.order_id ) AS total_suma,
--              sum( t.group_suma ) OVER( PARTITION BY (t.o).id ) AS total_suma,  -- For any WHERE this takes 2700ms
                *
            FROM (
                SELECT
                 sum( ocd.item_cost     ) AS group_cost,
                 sum( ocd.item_suma     ) AS group_suma,
                 max( (ocd.ic).consumed ) AS consumed,
                 (ocd.ic).consumed_period,
                 ocd.o,
                 (ocd.o).id as order_id
                FROM order_cost_details( tstzrange( '2019-04-01', '2019-05-01' ) ) ocd
                GROUP BY ocd.o, (ocd.o).id, (ocd.ic).consumed_period
            ) t
    ) t
    WHERE t.order_id = 6154 AND t.consumed_period @> '2019-04-01'::timestamptz       -- This takes 2ms
--  WHERE (t.o).id = 6154 AND t.consumed_period @> '2019-04-01'::timestamptz   -- This takes 2700ms

This change also makes call via function faster. I just need to sort via order_id field:

SELECT * FROM order_total_suma( tstzrange( '2019-04-01', '2019-05-01' ) ) ots 
-- This WHERE takes 2.5ms
WHERE ots.order_id IN ( 6154 ) AND ots.consumed_period @> '2019-04-01'::timestamptz
-- This WHERE takes 2500ms
-- WHERE (ots.o).id IN ( 6154 ) AND ots.consumed_period @> '2019-04-01'::timestamptz

回答2:

The plans are quite different.

The problem is the misestimate in the result count of the join between public.order_bt and the split_period subquery. That causes the function public.service_level_price to be evaluated 2882 times rather than once, which is where the time is spent.

Not sure what to do about this (we don't have the view definition, and it's probably nasty). Raising the COST of the function probably doesn't help as the optimizer thinks it will call it only once.

Actually, the best bet may be

ALTER FUNCTION public.calc_item_suma ROWS 1;

which might get the optimizer to choose a different plan.

来源：https://stackoverflow.com/questions/57003113/why-the-query-is-executed-76-times-slower-when-i-put-it-into-function

标签

postgresql

query-planner

postgresql-12