PostgreSQL: speed up SELECT query in table with millions of rows

后端 未结 3 1822
不思量自难忘°
不思量自难忘° 2020-12-30 16:49

I have a table with > 4.5 million rows and my SELECT query is far too slow for my needs.

The table is created with:

CREATE TABLE all_leg         


        
相关标签:
3条回答
  • 2020-12-30 17:03
    1. The very first you should change here is to remove composite primary key and use plain one-column one instead of this. This is because if you're going to use some columns index, it works the best with something like one column integer index which is here like a spine and allows your index to fetch fast rows you need to. When you have such big index like in your example, the planner may say that it will be faster for him to scan through whole table.

    2. Even if your index would be good enough to be used by planner, it may be dropped by ordering. I say that 'may be' as - as many things in sql - it depends on your actuall data in table, analyses, and so on. I'm not sure about Postgres but you may want to try to build another index on column used in order by or even to try composite index for (dep_dt, price_ct). Also you may try to put dep_dt to order by list to give a compiler a hint.

    3. Do you need all from this table? Using * vs id (for example) can also have a impact here.

    4. How unique values you have in dep_dt column? Sometimes planner can say that it may be more effective in making scan through whole table than by index because there is many non-unique values here.

    In summary, SQL querying is art of experimenting, as it all depends on current data (as planner is using statistics build by analyzer to guess optimal query plan). So it may even happen that when you have tuned query to table with thousand of rows, it may stop working when you reach millions.

    0 讨论(0)
  • 2020-12-30 17:15

    The index won't help.

    Two solutions:

    1. You chould either change the query to:

      WHERE dep_dt >= '2017-08-15 00:00:00' AND dep_dt < '2017-08-16 00:00:00'
      

      Then the index can be used.

    2. Create an index on an expression:

      CREATE INDEX ON all_legs(((dep_dt AT TIME ZONE 'UTC')::date));
      

      (or a different time zone) and change the query to

      WHERE (dep_dt AT TIME ZONE 'UTC')::date = '2017-08-16'
      

      The AT TIME ZONE is necessary because otherwise the result of the cast would depend on your current TimeZone setting.

    The first solution is simpler, but the second has the advantage that you can add price_ct to the index like this:

    CREATE INDEX ON all_legs(((dep_dt AT TIME ZONE 'UTC')::date), price_ct);
    

    Then you don't need a sort any more, and your query will be as fast as it can theoretically get.

    0 讨论(0)
  • 2020-12-30 17:16

    The index does not help because you use

    WHERE dept_dt::date=constant
    

    This seems fine to a beginner, but to the database, it looks like:

    WHERE convert_timestamp_to_date(dep_ts)=constant
    

    With convert_timestamp_to_date() being an arbitrary function (I just came up with the name, don't look it up in the docs). In order to use the index on dep_ts, the DB would have to reverse the function convert_timestamp_to_date into something like convert_date_to_timestamp_range (because a date corresponds to a range of timestamps, not just one timestamp), and then rewrite the WHERE as Laurenz did.

    Since there are many such functions, the database developers didn't bother to maintain a huge table of how to invert them. Also it would only help for special cases. For example, if you specified a date range in your WHERE instead of a "=constant" then it would be yet another special case. It's your job to handle this ;)

    Also, an index on (dep_dt,price_ct) won't speed up the sort as the first column is a timestamp, so the rows are not ordered in the index the way you want. You'd need an index on (dept_dt::date, price_ct) to eliminate the sort.

    Now, which index to create? This depends...

    If you also use timestamp range queries like "WHERE dep_dt BETWEEN ... AND ..." then the index on dep_dt needs to be the original timestamp type. In this case, creating another index on the same column, but converted to date, would be unnecessary (all indexes have to be updated on writes, so unnecessary indexes slow down inserts/updates). However, if you use the index on (dep_ts::date,price_ct) lots and lots of times and eliminating the sort is really important for you, then it may make sense. It's a tradeoff.

    0 讨论(0)
提交回复
热议问题