Best SQL indexes for join table

前端 未结 2 681
迷失自我
迷失自我 2021-02-02 06:52

With performance improvements in mind, I was wondering if and which indexes are helpful on a join table (specifically used in a Rails 3 has_and_belongs_to_many context).

相关标签:
2条回答
  • 2021-02-02 07:28

    Depends on how you are going to query the data.

    Assuming you want to search for all of these...

    • WHERE bar_id = ?
    • WHERE foo_id = ?
    • WHERE bar_id = ? AND foo_id = ?

    ...then you should probably go with an index on {bar_id, foo_id} and an index on {foo_id}.

    While you could also create a third index on {bar_id}, the price of maintaining additional index would probably outweigh the benefit of better clustering in the smaller index.


    Also, how do you plan to cover your queries with indexes? Some of the alternatives, such as...

    • {foo_id, bar_id} and {bar_id}
    • {foo_id, bar_id} and {bar_id, foo_id}

    ...might cover certain kinds of queries better.

    Covering is a balancing act - sometimes adding a field to an index just for covering purposes is justified, sometimes it's not. You won't know until you measure on realistic amounts of data.

    (Disclaimer: I'm not familiar with Ruby. This answer is purely from the database perspective.)

    0 讨论(0)
  • 2021-02-02 07:29

    The Answer

    The oft repeated answer, which tends to always be the case more often than not is, "it depends." More specifically, it depends on what your data is and how it will be used.

    tl;dr Explanation

    The short tl;dr answer for my specific case (and to cover all future bases) is choice #2 which is what I suspected. However, choice #3 would work just fine as, depending on my usage of the data, the extra time and space used creating the compound index could reduce future query lookups.

    The Full Explanation

    The reason for this is that databases try to be smart and try to do things as fast as possible regardless of programmer input. The most basic item to consider when adding an index is will this object be looked up by this key. If yes, an index can potentially help speed that up. However, whether this index is even used all comes down to selectivity and the cardinality of the field.

    Since foreign keys are typically the IDs of another AR class, cardinality usually will be high. But again, this depends on your data. In my example if there are many Foos but few Bars, many of the entries in my join table will have simliar bar_ids. With bar_ids having a low cardinality, an index on bar_id may never be used and may be getting in the way by having the database devote time and resources* to adding to this index every time a new bars_foos entry is created. The same goes with many Bars and few Foos and few of both.

    The general lesson is that when considering an index on a table, decide if the entries will be both looked up by this field and if this field has a high cardinality. That is, does this field have many distinct values? In the case of most join tables "it depends" and we must think more carefully about what the data represents and the relationships themselves. In my case, I will have both many Foos and Bars and will be looking up Foos by their associated bars and vice versa.

    Another good answer I got at the office was, "why are you worrying about your indexes? Build your app!"

    Footnotes

    * In a similar question on indexes on STI it was pointed out that the cost of an index is very low so when in doubt, just add it.

    0 讨论(0)
提交回复
热议问题