I\'ve searched around stackoverflow but everybody asks to optimize queries they\'ve already done.
I want to know, the basic stuff on what to do, what to avoid when creat
In your WHERE clause, avoid using a column as an input to a function, as this can cause a full table scan instead of being able to use an index. The query optimizer on some platforms does a better job than others, but it's generally better to be safe. For instance, if you're looking for records from the past 30 days, do the data manipulation against the date you're comparing against, not against your column:
BAD
WHERE DATEADD(DAY, 30, [RecordDate]) > GETDATE()
This may cause a full table scan (depending on the query optimizer for your platform), even if [RecordDate]
is indexed, because DATEADD(DAY, 30, [RecordDate])
has to be evaluated to compare it against GETDATE()
. If you change it to:
BETTER
WHERE [RecordDate] > DATEADD(DAY, -30, GETDATE())
This will now always be able to use an index on [RecordDate]
regardless of how good the query plan optimizer is on your platform, because DATEADD(DAY, -30, GETDATE())
gets evaluated once and can then be used as a lookup in the index. The same principle applies to using a CASE
statement, UDF's, etc.
I cant actually validate your claim but can say that not using * sounds quiet logical, what i can do is add a point or two to them, if you can along with giving a select columnname from tablename add a where clause it helps a lot, since you would cut down on a lot of unnecessary rows and rows of data that may be pulled up, also avoiding cross joins and welcoming inner joins, outer joins or fuller joins should be the way to go as per my personal experience :)
My list is SQL Server specific (I'm sure that are lots more):
Use sargable where clauses - that means no functions especially scalar UDFs in where clauses among other things
WHERE NOT EXISTS tends to be the faster choice than a left join with a where id is null structure when you are looking for those rows which don't match a second table.
Correlated subqueries tend to run row by row and are horribly slow.
Views that call other views can't be indexed and become very slow especially if you get several levels in on large tables.
Select * is to be avoided espcially when you have a join as at least one column is sent twice which is wasteful of server and database and network resources.
Cursors can usually be replaced with much faster performing set-based logic When you store data in the correct way, you can avoid alot of on-the-fly transformations.
When updating, make sure you add a where clause so that you don't update rows where the new value and the old value are the same. This could be the differnce between updating 10,000,000 rows and updating 15. Sample (Tsql Update structure, if you use another db, you may have to lookup the correct syntax, but it should give you the idea.):
Update t
set field1 = t2.field2
from table1 t
join table2 t2 on t.tid = t2.tid
Where t.field1 <> t2.field2
Or
Update t
set field1 = @variable
from table1 t
Where t.field1 <> @variable
Check your indexing. SQL Seerver does not automatically index foreign keys. If they are used in a join, they generally need to be indexed.
If you are constantly using functions on a field, you are probably not storing it correctly (or you should have a persisted calculated field and do the transformation only once not every time you select the column.)
You best bet is to get a good performance tuning book for your database of choice (what wokrs best is very database specific) and read the chapters concerning writing queries.