Postgres NOT IN performance

前端 未结 4 530
暗喜
暗喜 2020-12-28 22:09

Any ideas how to speed up this query?

Input

EXPLAIN SELECT entityid FROM entity e

LEFT JOIN level1entity l1 ON l1.level1id = e.level1_level1id
LEFT         


        
相关标签:
4条回答
  • 2020-12-28 22:26

    Since you are requiring level2entity record because of your where clause check for a specific userid "l2.userid = " You should make your "LEFT JOIN level2entity" into an "INNER JOIN level2entity"

    INNER JOIN level2entity l2 ON l2.level2id = l1.level2_level2id AND l2.userid = 'a987c246-65e5-48f6-9d2d-a7bcb6284c8f'
    

    This will, hopefully, filter down your entity's so your NOT IN will have less work to do.

    0 讨论(0)
  • 2020-12-28 22:33

    You might get a better result if you can rewrite the query to use a hash anti-join.

    Something like:

    with exclude_list as (
      select unnest(string_to_array('1377776,1377792,1377793,1377794,1377795, ...',','))::integer entity_id)
    select entity_id
    from   entity left join exclude_list on entity.entity_id = exclude_list.entity_id
    where  exclude_list.entity_id is null;
    
    0 讨论(0)
  • 2020-12-28 22:42

    ok my solution was

    • select all entities
    • left join all entities which have one of the ids (without the not is is faster) on the entityid
    • select all rows where the joined select is NULL

    as explained in

    http://blog.hagander.net/archives/66-Speeding-up-NOT-IN.html

    0 讨论(0)
  • 2020-12-28 22:46

    A huge IN list is very inefficient. PostgreSQL should ideally identify it and turn it into a relation that it does an anti-join on, but at this point the query planner doesn't know how to do that, and the planning time required to identify this case would cost every query that uses NOT IN sensibly, so it'd have to be a very low cost check. See this earlier much more detailed answer on the topic.

    As David Aldridge wrote this is best solved by turning it into an anti-join. I'd write it as a join over a VALUES list simply because PostgreSQL is extremely fast at parsing VALUES lists into relations, but the effect is the same:

    SELECT entityid 
    FROM entity e
    LEFT JOIN level1entity l1 ON l.level1id = e.level1_level1id
    LEFT JOIN level2entity l2 ON l2.level2id = l1.level2_level2id
    LEFT OUTER JOIN (
        VALUES
        (1377776),(1377792),(1377793),(1377794),(1377795),(1377796)
    ) ex(ex_entityid) ON (entityid = ex_entityid)
    WHERE l2.userid = 'a987c246-65e5-48f6-9d2d-a7bcb6284c8f' 
    AND ex_entityid IS NULL; 
    

    For a sufficiently large set of values you might even be better off creating a temporary table, COPYing the values into it, creating a PRIMARY KEY on it, and joining on that.

    More possibilities explored here:

    https://stackoverflow.com/a/17038097/398670

    0 讨论(0)
提交回复
热议问题