I\'ve searched around stackoverflow but everybody asks to optimize queries they\'ve already done.
I want to know, the basic stuff on what to do, what to avoid when creat
Edit, Feb 2012:
Avoid these "Ten Common SQL Programming Mistakes"
A few general points about optimizing queries:
Know your data. Know your data. Know your data. I would venture to guess that half of all database performance problems stem from an incomplete understanding of the data and the requirements of the query. Know if your query will be usually returning 50 rows or 5 million rows. Know if you need to get back 3 columns or 50 columns. Know what columns are key columns on the tables, and filter on these.
Understand your database structure. If you're working with a database in third-normal form, recognize that this structure typically works best on queries for lots of small, transactional statements operating on individual rows. If you are working in a star or snowflake design, recognize that it's optimized for large queries and aggregations.
Here is a good link about Best Practices and performance on SQL server. http://www.sql-server-performance.com/articles/dev/sql_best_practices_p1.aspx
My simple rules to write a query:
Write FROM
clause from the most smallest table. This helps to find data more efficiently as we make searching in smaller amount of data.
At first you should write INNER JOIN
, then LEFT OUTER JOIN
. This helps to decrease quantity of rows where SQL Engine will search your data.
For example:
SELECT
pe.Name,
de.Name,
bu.Name
FROM dbo.Persons pe
INNER JOIN dbo.Departments de ON pe.ID = de.id_Person -- at first INNER JOIN
LEFT JOIN dbo.Bureau bu ON bu.ID = de.id_Bureau -- then LEFT OUTER JOIN
Use aliases and schema name to avoid schema scanning by SQL Server. As using schema name helps to cashe your query plan for ad-hoc queries that can be reusable by other users, not only for your queries.
Avoid using SELECT * ...
adding some tips to the list :
Using EXISTS/NOT EXISTS in place of IN/NOT IN for indexed columns
--instead of
SELECT * FROM table1
WHERE id1 NOT IN (SELECT id2 FROM table2)
--you better write
SELECT * FROM table1 WHERE NOT EXISTS (SELECT 1 FROM table2 WHERE id1=id2)
Avoid using UNION when its possible to use UNION ALL
when you dont need to exclude duplicated rows or you are sure it wont return duplicated rows
Avoid using HAVING when its possible to use WHERE
--instead of
SELECT col1, sum(col2)
FROM table1
GROUP BY col1
HAVING col1 > 0
--you better write :
SELECT col1, sum(col2)
FROM table1
WHERE col1 > 0
GROUP BY col1
Use EXISTS instead of DISTINCT when you have one-to-many table joins
--instead of
SELECT distinct a.col1, a.col2
FROM table1 a, table2 b
WHERE a.id = b.id
--you better write
SELECT a.col1, a.col2
FROM table1 a
WHERE EXISTS (SELECT 1 FROM table2 b where a.id = b.id)
I hope this few tips helps, looking forward more tips ;)
From what I've read, using BETWEEN
instead of two checks on an index, using AND
, improves performance because your database may not fully utilize the benefits of indexes when it sees that it is used on both sides of an AND
, or OR
.
The query optimizer may not be able to intuit that this is a range check and that the index sorting can come in handy. Instead it may do a scan on each condition and then combine the results. On the other hand, this is very clear with a BETWEEN
clause that compares the index column to two values.