Bad optimization/planning on Postgres window-based queries (partition by(, group by?)) - 1000x speedup

后端 未结 1 846
深忆病人
深忆病人 2020-12-30 16:24

We are running Postgres 9.3.5. (07/2014) We have quite some complex datawarehouse/reporting setup in place (ETL, materialized views, indexing, aggregations, analytical funct

相关标签:
1条回答
  • 2020-12-30 16:44

    Luckily I found out that I could give the planner an (implicit) hint to do this by making sure that the column order of the other view partitions/windows is more homogenous although not semantically necessary.

    The following change now returned what I had expected to get in the first place (the usage of the ply index):

    ...
    window
      -- w_pyl  as (partition by  x.pot, x.year, x.loc)  -- showstopper (from above)
         w_pyl  as (partition by  x.pot, x.loc, x.year)  -- speedy
         w_pl   as (partition by  x.pot, x.loc  order by x.year)
    

    The 1000 times faster performing result:

     Limit  (cost=1.25..308.02 rows=100 width=512)
       ->  WindowAgg  (cost=1.25..284794.82 rows=93138 width=408)
             ->  WindowAgg  (cost=1.25..282000.68 rows=93138 width=408)
                   ->  Nested Loop Left Join  (cost=1.25..278508.01 rows=93138 width=408)
                         Join Filter: ...
                         ->  Nested Loop Left Join  (cost=0.83..214569.60 rows=93138 width=392)
                               ->  Index Scan using pot_location_year_idx__p_l_y on pot_location_year x  (cost=0.42..11665.49 rows=93138 width=306)
                               ->  Index Scan using ...  (cost=0.41..2.17 rows=1 width=140)
                                     Index Cond: ...
                         ->  Index Scan using ...  (cost=0.41..0.67 rows=1 width=126)
                               Index Cond: ...
    

    Update 2014-10-09:

    Tom Lane-2 wrote this (one of the major postgres developers) related to another (likely related) window function problem I am facing here as well on 2013-02 related to pg 9.2.2:

    ... There's not nearly that amount of intelligence in the system about window functions, as yet. So you'll have to write out the query longhand and put the WHERE clause at the lower level, if you want this optimization to happen.

    So some more (debatable) general thoughts on the subject of window functions, data warehouse functionality etc. that could be considered here:

    The above is a good statement to strengthen my assumption, when it was decided to do some Oracle->Postgres migration in general projects and in a DWH environment, that the risk of spending much more time and money doing so would be quite high. (Although the investigated functionality may seem sufficient.)

    I like Postgres in important areas much more than Oracle, looking e.g. at the syntax and clarity of the code and other things (I guess even the source code and thus maintainability (in all its aspects) is much better there), but Oracle is clearly the much more advanced player in the resource optimization, support and tooling areas, when you are dealing with more complex db functionality outside the typical CRUD management.

    I guess the open source Postgres (as well as the EnterpriseDB topups) will catch up in the long run in those areas, but it will take them at least 10 years, and maybe only if it is pushed heavily by big, altruistic1 global players like Google etc.)

    1 altruistic in the sense, that if the pushed areas stay "free", the benefit for those companies must be surely somewhere else (maybe with some advertisement rows added randomly - I guess we could live with it here and there ;))


    Update 2014-10-13:

    As linked in my previous update above (2014-10-09), the optimization problems and their workaround solutions go on in a quite similiar way (after the above fix), when you want to query the above view with constraints/filters (here on pot_id):

    explain select * foo where pot_id = '12345' fetch first 100 rows only
    

    ...

     Limit  (cost=1.25..121151.44 rows=100 width=211)
       ->  Subquery Scan on foo  (cost=1.25..279858.20 rows=231 width=211)
             Filter: ((foo.pot_id)::text = '12345'::text)
             ->  WindowAgg  (cost=1.25..277320.53 rows=203013 width=107)
                   ->  WindowAgg  (cost=1.25..271230.14 rows=203013 width=107)
                         ->  Nested Loop Left Join  (cost=1.25..263617.16 rows=203013 width=107)
                               ->  Merge Left Join  (cost=0.83..35629.02 rows=203013 width=91)
                                     Merge Cond: ...
                                     ->  Index Scan using pot_location_year_idx__p_l_y on pot_location_year x  (cost=0.42..15493.80 rows=93138 width=65)
                                     ->  Materialize  (cost=0.41..15459.42 rows=33198 width=46)
                                           ->  Index Scan using ...  (cost=0.41..15376.43 rows=33198 width=46)
                               ->  Index Scan using ...  (cost=0.42..1.11 rows=1 width=46)
                                     Index Cond: ...
    

    And as suggested in the above link, if you want to "push down" the contraint/filter before the window aggregation, you have to do it explicitly in the view itself already, which will be efficient for this type of query then with another 1000 times speedup for the 100th row:

     create view foo as
     ...
     where pot_id='12345'
     ...
    

    ...

     Limit  (cost=1.25..943.47 rows=100 width=211)
       ->  WindowAgg  (cost=1.25..9780.52 rows=1039 width=107)
             ->  WindowAgg  (cost=1.25..9751.95 rows=1039 width=107)
                   ->  Nested Loop Left Join  (cost=1.25..9715.58 rows=1039 width=107)
                         ->  Nested Loop Left Join  (cost=0.83..1129.47 rows=1039 width=91)
                               ->  Index Scan using pot_location_year_idx__p_l_y on pot_location_year x (cost=0.42..269.77 rows=106 width=65)
                                     Index Cond: ((pot_id)::text = '12345'::text)
                               ->  Index Scan using ...  (cost=0.41..8.10 rows=1 width=46)
                                     Index Cond: ...
                         ->  Index Scan using ...  (cost=0.42..8.25 rows=1 width=46)
                               Index Cond: ...
    

    After some more view parameterization effort2 this approach will help speedup certain queries constraining those columns, but is still quite inflexible regarding a more general foo-view usage and query optimization.

    2: You can "parameterize such a view" putting it (its SQL) in a (set-returning) table function (the Oracle equivalent to a pipelined table function). Further details regarding this may be found in the forum link above.

    0 讨论(0)
提交回复
热议问题