My MySQL database serves three webapps as the storage backend. However I recently encounter permanantly the error \"Waiting for table metadata lock\". It happen nearly all t
The accepted solution is, unfortunately, wrong. It is right as far as it says,
Kill the connection with lock
This is indeed (almost surely; see below) what to do. But then it suggests,
Kill 1398
...and 1398 is not the connection with the lock. How could it be? 1398 is the connection waiting for the lock. This means it does not yet have the lock, and therefore, killing it avails nothing. The process holding the lock will still hold the lock, and the next thread trying to do something will therefore also stall and enter "Waiting for metadata lock" in due order.
You have no guarantee that the processes "waiting for metadata lock" (WFML) won't also block, but you can be certain that killing only WFML processes will achieve exactly nothing.
The real cause is that another process is holding the lock, and more importantly, SHOW FULL PROCESSLIST
will not tell you directly which it is.
It WILL tell you if the process is doing something, yes. Usually it works. Here, the process holding the lock is doing nothing, and hides among other threads also doing nothing.
In this case the culprit is almost certainly process 1396, which started before process 1398 and is now in Sleep
state, and has been for 46 seconds. Since 1396 clearly did all that it needed to do (as proved by the fact that it is now sleeping, and has done so for 46 seconds, as far as MySQL is concerned), no thread before that could have held a lock (or 1396 would also have stalled).
IMPORTANT: if you connected to MySQL as a limited user, SHOW FULL PROCESSLIST
will not show all the processes. So the lock might be held by a process that you don't see.
SHOW PROCESSLIST
SELECT ID, TIME, USER, HOST, DB, COMMAND, STATE, INFO
FROM INFORMATION_SCHEMA.PROCESSLIST WHERE DB IS NOT NULL
AND (`INFO` NOT LIKE '%INFORMATION_SCHEMA%' OR INFO IS NULL)
ORDER BY `DB`, `TIME` DESC
The above can be tuned to show only the processes in SLEEP state, and anyway it will sort them by time descending, so it is easier to find the process that is hanging (it should be the Sleep
'ing one immediately before the ones "waiting for metadata lock").
Kill all processes in "Sleep" state, on the same database, that are older than the oldest thread in "waiting for metadata lock" state. This is what Arnaud Amaury would have done:
Ninety-nine times out of one hundred, the thread to be killed is the youngest among those in Sleep state that are older than the older one waiting for metadata lock:
TIME STATUS
319 Sleep
205 Sleep
19 Sleep <--- one of these two "19"
19 Sleep <--- and probably this one(*)
15 Waiting for metadata lock <--- oldest WFML
15 Waiting for metadata lock
14 Waiting for metadata lock
(*) the TIME order actually has milliseconds, or so I was told, it just doesn't show them. So while both processes have a Time value of 19, the lowest one ought to be younger.
Run SHOW ENGINE INNODB STATUS
and look at the "TRANSACTION" section. You will find, among others, something like
TRANSACTION 1701, ACTIVE 58 sec;2 lock struct(s), heap size 376, 1 row lock(s), undo log entries 1
MySQL thread id 1396, OS thread handle 0x7fd06d675700, query id 1138 hostname 1.2.3.4 whatever;
Now you check with SHOW FULL PROCESSLIST
what is thread id 1396 doing with its #1701 transaction. Chances are it is in "Sleep" status. So: an active transaction (#1701) with an active lock, it has even done some changes as it has an undo log entry... but is currently idle. That and no other is the thread you need to kill. Losing those changes.
Remember that doing nothing in MySQL does not mean doing nothing in general. If you get some records from MySQL and build a CSV for FTP upload, during the FTP upload the MySQL connection is idle.
Actually if the process using MySQL and the MySQL server are on the same machine, that machine runs Linux, and you have root privileges, there's a way to find out which process has the connection that requested the lock. This in turn allows to determine (from CPU usage or, at worst, strace -ff -p pid
) whether that process is really doing something or not, to help decide if it's safe to kill.
I see this happening with webapps that use "persistent" or "pooled" MySQL connections, which nowadays usually save very little time: the webapp instance terminated, but the connection did not, so its lock is still alive... and blocking everyone else.
Another way to get the culprit if you have a recent MySQL, but not too recent since this is going to be deprecated, is (you need privileges again on the information schema)
SELECT * FROM INFORMATION_SCHEMA.INNODB_LOCKS
WHERE LOCK_TRX_ID IN
(SELECT BLOCKING_TRX_ID FROM INFORMATION_SCHEMA.INNODB_LOCK_WAITS);
The problem is usually caused by this architecture:
webapp (jar, php) --> container or application pool (mysqldb, php_module, fastcgi...) --> web server --> MySQL
When the webapp dies, or the webapp lightweight thread instance dies, the container might not. And it is the container that keeps the connection open, so obviously the connection does not close. Quite predictably, MySQL does not consider the operation complete.
If the webapp didn't clean after itself (no ROLLBACK
or COMMIT
for a transaction, no UNLOCK TABLES
, etc.), then whatever that webapp started doing is still extant, and might still be blocking everyone else.
There are then two solutions. The worse one is to lower the idle timeout. But guess what happens if you wait too long between two queries (exactly: "MySQL server has gone away". Or you could use mysql_ping
if available (soon to be deprecated. There are workarounds for PDO). Or you might check for that error, and reopen the connection (this is the Python way). So - for a small performance fee - it's doable.
The better, smarter solution is less straightforward to implement. Endeavour to have the script clean after itself, ensuring to catch all exception and deal with them properly, or, if possible, skip persistent connections altogether. Let each instance create its own connection or use a smart pool driver (in PHP PDO, use PDO::ATTR_PERSISTENT
explicitly set to false
). Alternatively (e.g. in PHP) you can have destruct and exception handlers force clean the connection by committing or rolling back transactions and issuing explicit table unlocks.
In case if you have HS plugin and trying to CREATE
or ALTER
table which was already attempted to assess via HS you will face with similar problem and you have to restart HS plugin in this way to release table metadata lock:
UNINSTALL PLUGIN HANDLERSOCKET;
INSTALL PLUGIN HANDLERSOCKET SONAME 'handlersocket.so';
Kill the connection with lock
Kill 1398
Then check if you have autocommit set to 0 by
select @@autocommit;
If yes, you propably forgot to commit transaction. Then another connection want to do something with this table, which causes the lock.
In your case: If you made some query to LogEntries (whitch exists) and did not commit it, then you try to execute CREATE TABLE IF NOT EXISTS from another connection - metadata lock happens.
edit For me the bug is somewhere at your application. Check there, or set autocommit to 1 if your not using transactions in application.
ps also check this posts: