Can I optimize a SELECT DISTINCT x FROM hugeTable query by creating an index on column x?

后端 未结 8 782
闹比i
闹比i 2021-01-01 11:51

I have a huge table, having a much smaller number (by orders of magnitude) of distinct values on some column x.

I need to do a query like SELECT D

相关标签:
8条回答
  • 2021-01-01 12:00

    No. But there are some workarounds (excluding normalization):

    Once the index is in place, then its possible to implement in SQL what the optimizer could be doing automatically:

    https://stackoverflow.com/a/29286754/538763 (multiple workarounds cited)

    Other answers say you can normalize which would solve your issue but even once its normalized SQL Server still likes to perform a scan to find the max() within group(s). Workarounds:

    https://dba.stackexchange.com/questions/48848/efficiently-query-max-over-multiple-ranges?rq=1

    0 讨论(0)
  • 2021-01-01 12:01

    SQL Server does not implement any facility to seek directly to the next distinct value in an index skipping duplicates along the way.

    If you have many duplicates then you may be able to use a recursive CTE to simulate this. The technique comes from here. ("Super-fast DISTINCT using a recursive CTE"). For example:

    with recursivecte as (
      select min(t.x) as x
      from hugetable t
      union all
      select ranked.x
      from (
        select t.x,
               row_number() over (order by t.x) as rnk
        from hugetable t
        join recursivecte r
          on r.x < t.x
      ) ranked
      where ranked.rnk = 1
    )
    select *
    from recursivecte
    option (maxrecursion 0)
    
    0 讨论(0)
  • 2021-01-01 12:02

    This is likely not a problem of indexing, but one of data design. Normalization, to be precise. The fact that you need to query distinct values of a field, and even willing to add an index, is a strong indicator that the field should be normalized into a separate table with a (small) join key. Then the distinct values will be available immediately by scanning the much smaller lookup foreign table.

    Update
    As a workaround, you can create an indexed view on an aggregate by the 'distinct' field. COUNT_BIG is an aggregate that is allowed in indexed views:

    create view vwDistinct
    with schemabinding
    as select x, count_big(*)
    from schema.hugetable
    group by x;
    
    create clustered index cdxDistinct on vwDistinct(x);
    
    select x from vwDistinct with (noexpand);
    
    0 讨论(0)
  • 2021-01-01 12:09

    If your column x has low cardinality, creating local bitmap index would increase the performance many fold.

    0 讨论(0)
  • 2021-01-01 12:12

    When doing a SELECT DISTINCT on an indexed field, an index scan makes sense, as execution still has to scan each value in the index for the entire table (assuming no WHERE clause, as seems to be the case by your example).

    Indexes usually have more of an impact on WHERE conditions, JOINS, and ORDER BY clauses.

    0 讨论(0)
  • 2021-01-01 12:12

    As per your description of the execution plan, I would believe it's the best possible execution.

    The Index Scan reads the entire index as stored (not in index order), the HASH MATCH does the distinct.

    There might be other ways around your problem. In SQL Server, Indexed Views come to my mind. However, that might give you a big hit for write's on that table.

    0 讨论(0)
提交回复
热议问题