How to decide when use index on table column

岁酱吖の 提交于 2019-12-02 20:39:56

In general, my indexing strategy would be something like this (I'm using SQL Server exclusively for now - adapt to your own database system as needed):

  • pick a good clustering key - not a GUID, not a VARCHAR(250) or something - a good clustering key is narrow, unique, stable, ever-increasing - something like a INT IDENTITY is perfect. Makes this your clustered primary key -> gives you your first index on the table

  • for any column that is being used as a foreign key into another table - add an index. It can either be a single column index - or it might be a compound index - whatever works best for your case. It's important that the foreign key column be the first column in that index (if you're using a compound index) - otherwise, the benefits for the JOIN's or for checking referential integrity won't be available to your system

And that's it for now.

Then: run your system - observe, and measure - establish a baseline. Is the app fast enough? If yes -> you're done - go home and enjoy your spare time.

If not: then start collecting data and indications as to why the app isn't fast enough. Look at e.g. things like the DMV's in SQL Server that tell you about the worst performing queries, or the missing index DMV. Analyze those. See what you could improve. Add one index at a time and again: observe, measure, compare to your baseline.

If you have improvement -> leave that index in place and this measurement is your new baseline. Rinse and repeat until you (and your users) are happy with the app's performance (and then go home and enjoy your time off).

Over-indexing in SQL Server can be worse than not having any indexes. Don't start out with too many indices to begin with! Only establish good clustered PK and foreign key nonclustered indices - that's all - then observe, measure, optimize & repeat that cycle.

That’s a very complex discussion, there are several things you have to keep in mind. Mainly you shouldn’t consider an index by the number of rows you have on your table but based on the queries you run against it. A index will help only select queries and at the same time it will slightly decrease performance of inserts, deletes and updates, because besides altering the row on the table, you have to alter the index.

You seem to be new on the matter so what I suggest is that you take a look at your execution plans and try to eliminate all “scans” operations because they pretty much read all table or even all index. You should look always to have seeks but yet you should balance it with the amount of indexes you have on your table.

If you are using SQL Server, you can run a trace with SQL Server profiler to help you

EDIT:

Can be in such case non-index selecting/reading more effective than index read?

yes, but if this case should occur, the engine will be smart enough not to use the index

An index is good for picking a fraction of the rows from a table. Querying by a primary key value is the best utilization of an index. The worst scenario is accessing all rows from a table via an index, because it has to read index pages and referenced data pages. Another example is that an in-memory sort of a result set might be faster than sorting a result set via an index on the sorted column. And never forget, while an index may improve query performance, an index decrease write performance.

Some have mentioned taking a baseline, using some kind of tracing utility to measure performance etc. If you are fine with the established performance, move on. If not, analyze the execution plan, the physical data model (available indexes), re-calculate statistics and see if the helps the optimizer to choose a better execution plan. Make sure that the DBMS can (is allowed to) utilize the available RAM. Try to minimize disk I/O and so on.

In case of Firebird 2.5, the newly added Firebird Trace API is a godsend. You are now finally able to get a near real-time trace on what's executed against the database with performance counters (execution plan, execution time, I/O stats ...). And a third-party product called FB TraceManager by Upscene Productions makes the Trace API a joy to use.

Regarding the secong part of your question, if a table has just 15 rows, it is very likely that no matter how many indexes you have the table will always be scanned because it is so small.

I use this query for getting a clue of which of my tables need an index:

-- Missing Indexes for current database by Index Advantage  (Query 57) (Missing Indexes)
SELECT DISTINCT CONVERT(decimal(18,2), user_seeks * avg_total_user_cost * (avg_user_impact * 0.01)) AS [index_advantage], 
migs.last_user_seek, mid.[statement] AS [Database.Schema.Table],
mid.equality_columns, mid.inequality_columns, mid.included_columns,
migs.unique_compiles, migs.user_seeks, migs.avg_total_user_cost, migs.avg_user_impact,
OBJECT_NAME(mid.[object_id]) AS [Table Name], p.rows AS [Table Rows]
FROM sys.dm_db_missing_index_group_stats AS migs WITH (NOLOCK)
INNER JOIN sys.dm_db_missing_index_groups AS mig WITH (NOLOCK)
ON migs.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details AS mid WITH (NOLOCK)
ON mig.index_handle = mid.index_handle
INNER JOIN sys.partitions AS p WITH (NOLOCK)
ON p.[object_id] = mid.[object_id]
WHERE mid.database_id = DB_ID() 
ORDER BY index_advantage DESC OPTION (RECOMPILE);

Notice that this will only give you a north, you still need to take into account what have been answered above.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!