How can I avoid a full table scan on this mysql query?

后端 未结 4 2135
忘掉有多难
忘掉有多难 2021-02-15 13:00
explain
select
    *
from
    zipcode_distances z 
inner join
    venues v    
    on z.zipcode_to=v.zipcode
inner join
    events e
    on v.id=e.venue_id
where
    z.z         


        
相关标签:
4条回答
  • 2021-02-15 13:46

    Have indexed the columns in both tables?

    e.id and v.venue_id
    

    If you do not, creates indexes in both tables. If you already have, it could be that you have few records in one or more tables and analyzer detects that it is more efficient to perform a full scan rather than an indexed read.

    0 讨论(0)
  • 2021-02-15 13:46

    You could use a subquery:

    select * from zipcode_distances z, venues v, events e
    where
        z.id in (select id from zipcode z where z.zipcode_from='92108' and z.distance <= 5)
        and z.zipcode_to=v.zipcode
        and v.id=e.venue_id
    
    0 讨论(0)
  • 2021-02-15 13:49

    Based on the EXPLAIN output in your question, you already have all the indexes the query should be using, namely:

    CREATE INDEX idx_zip_from_distance
      ON zipcode_distances (zipcode_from, distance, zipcode_to);
    CREATE INDEX idx_zipcode ON venues (zipcode, id);
    CREATE INDEX idx_venue_id ON events (venue_id);
    

    (I'm not sure from your index names whether idx_zip_from_distance really includes the zipcode_to column. If not, you should add it to make it a covering index. Also, I've included the venues.id column in idx_zipcode for completeness, but, assuming it's the primary key for the table and that you're using InnoDB, it will be included automatically anyway.)

    However, it looks like MySQL is choosing a different, and possibly suboptimal, query plan, where it scans through all events, finds their venues and zip codes, and only then filters the results on distance. This could be the optimal query plan, if the cardinality of the events table was low enough, but from the fact that you're asking this question I assume it's not.

    One reason for the suboptimal query plan could be the fact that you have too many indexes which are confusing the planner. For instance, do you really need all three of those indexes on the zipcode table, given that the data it stores is presumably symmetric? Personally, I'd suggest only the index I described above, plus a unique index (which can also be the primary key, if you don't have an artificial one) on (zipcode_to, zipcode_from) (preferably in that order, so that any occasional queries on zipcode_to=? can make use of it).

    However, based on some testing I did, I suspect the main issue why MySQL is choosing the wrong query plan comes simply down to the relative cardinalities of your tables. Presumably, your actual zipcode_distances table is huge, and MySQL isn't smart enough to realize quite how much the conditions in the WHERE clause really narrow it down.

    If so, the best and simplest fix may be to simply force MySQL to use the indexes you want:

    select
        *
    from
        zipcode_distances z 
        FORCE INDEX (idx_zip_from_distance)
    inner join
        venues v    
        FORCE INDEX (idx_zipcode)
        on z.zipcode_to=v.zipcode
    inner join
        events e
        FORCE INDEX (idx_venue_id)
        on v.id=e.venue_id
    where
        z.zipcode_from='92108' and
        z.distance <= 5
    

    With that query, you should indeed get the desired query plan. (You do need FORCE INDEX here, since with just USE INDEX the query planner could still decide to use a table scan instead of the suggested index, defeating the purpose. I had this happen when I first tested this.)

    Ps. Here's a demo on SQLize, both with and without FORCE INDEX, demonstrating the issue.

    0 讨论(0)
  • 2021-02-15 14:01

    You are selecting all columns from all tables (select *) so there is little point in the optimizer using an index when the query engine will then have to do a lookup from the index to the table on every single row.

    0 讨论(0)
提交回复
热议问题