If I SELECT
IDs then UPDATE
using those IDs, then the UPDATE
query is faster than if I would UPDATE
using the conditions
The comment by Michael J.V is the best description. This answer assumes a
is a column that is not indexed and 'id' is.
The WHERE clause in the first UPDATE command is working off the primary key of the table, id
The WHERE clause in the second UPDATE command is working off a non-indexed column. This makes the finding of the columns to be updated significantly slower.
Never underestimate the power of indexes. A table will perform better if the indexes are used correctly than a table a tenth the size with no indexing.
Do you have a composite index or separate indexes?
If it is a composite index of id
and a
columns,
In 2nd update statement the a
column's index would not be used. The reason is that only the left most prefix indexes are used (unless if a
is the PRIMARY KEY
)
So if you want the a
column's index to be used, you need in include id
in your WHERE
clause as well, with id
first then a
.
Also it depends on what storage engine you are using since MySQL does indexes at the engine level, not server.
You can try this:
UPDATE table SET field = value WHERE id IN (...) AND a IS NULL LIMIT 10;
By doing this id
is in the left most index followed by a
Also from your comments, the lookups are much faster because if you are using InnoDB
, updating columns would mean that the InnoDB
storage engine would have to move indexes to a different page node, or have to split a page if the page is already full, since InnoDB
stores indexes in sequential order. This process is VERY slow and expensive, and gets even slower if your indexes are fragmented, or if your table is very big
The accepted answer seems right but is incomplete, there are major differences.
As much as I understand, and I'm not a SQL expert:
The first query you SELECT N rows and UPDATE them using the primary key.
That's very fast as you have a direct access to all rows based on the fastest possible index.
The second query you UPDATE N rows using LIMIT That will lock all rows and release again after the update is finished.
The big difference is that you have a RACE CONDITION in case 1) and an atomic UPDATE in case 2)
If you have two or more simultanous calls of the case 1) query you'll have the situation that you select the SAME id's from the table. Both calls will update the same IDs simultanously, overwriting each other. This is called "race condition".
The second case is avoiding that issue, mysql will lock all rows during the update. If a second session is doing the same command it will have a wait time until the rows are unlocked. So no race condition is possible at the expense of lost time.
I don't think there can be a one straight-forward answer to your "why?" without doing some sort of analysis and research.
The SELECT queries are normally cached, which means that if you run the same SELECT query multiple times, the execution time of the first query is normally greater than the following queries. Please note that this behavior can only be experienced where the SELECT is heavy and not in scenarios where even the first SELECT is much faster. So, in your example it might be that the SELECT took 0.00s because of the caching. The UPDATE queries are using different WHERE clauses and hence it is likely that their execution times are different.
Though the column a
is indexed, but it is not necessary that MySQL must be using the index when doing the SELECT or the UPDATE. Please study the EXPLAIN outputs. Also, see the output of SHOW INDEX
and check if the "Comment" column reads "disabled" for any indexes? You may read more here - http://dev.mysql.com/doc/refman/5.0/en/show-index.html and http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html.
Also, if we ignore the SELECT for a while and focus only on the UPDATE queries, it is obvious that they aren't both using the same WHERE condition - the first one runs on id
column and the latter on a
. Though both columns are indexed but it does not necessarily mean that all the table indexes perform alike. It is possible that some index is more efficient than the other depending on the size of the index or the datatype of the indexed column or if it is a single- or multiple-column index. There sure might be other reasons but I ain't an expert on it.
Also, I think that the second UPDATE is doing more work in the sense that it might be putting more row-level locks compared to the first UPDATE. It is true that both UPDATES are finally updating the same number of rows. But where in the first update, it is 10 rows that are locked, I think in the second UPDATE, all rows with a
as NULL (which is more than 10) are locked before doing the UPDATE. Perhaps MySQL first applies the locking and then runs the LIMIT clause to update only limited records.
Hope the above explanation makes sense!
The two queries are not identical. You only know that the IDs are unique in the table.
UPDATE ... LIMIT 10 will update at most 10 records.
UPDATE ... WHERE id IN (SELECT ... LIMIT 10) may update more than 10 records if there are duplicate ids.
Regarding "MySQL doesn't support updating the same table you're selecting from"
UPDATE table SET field = value
WHERE id IN (SELECT id FROM table WHERE a IS NULL LIMIT 10);
Just do this:
UPDATE table SET field = value
WHERE id IN (select id from (SELECT id FROM table WHERE a IS NULL LIMIT 10));