Phantom Metadata Lock -- MySQL 5.5, 5.6, and 5.7.x

Has anyone run into the MySQL bug with the phantom “Waiting for table metadata lock” bug before? We get them every now and then where there is no valid lock (and no traffic altogether) and yet, it hangs as if the table is in lock status. No bueno.

The solution thus far has been to restart the master daemon to “clear out” the phantom locks… This is not always possible with OLTP systems, however.

It has appeared in 5.5, 5.6 and 5.7…

I found this gem over at the Percona blog:

Hi @erinlkolp, may I see the error?

need clarification on what it is in the above statement. Are we talking about liquibase console output?

Hi @ronak!

Here is the output from liquibase when this occurs. Note that I’m duplicating this in a lab env.

17:49:05.429 INFO  [liquibase.Liquibase]: Executing: pt-online-schema-change --recursion-method=none --set-vars wait_timeout=31536000,innodb_lock_wait_timeout=31536000,lock_wait_timeout=31536000 --max-load Threads_running=120 --critical-load Threads_running=749 --alter-foreign-keys-method=auto --alter="ADD COLUMN new_column BLOB NULL AFTER old_column" --host=db --port=3306 --user=root --password=REDACTED --execute D=database,t=table
17:49:05.607 INFO  [liquibase.Liquibase]: No slaves found.  See --recursion-method if host c5c7d31fc20c has slaves.
17:49:05.607 INFO  [liquibase.Liquibase]: Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
17:49:05.613 INFO  [liquibase.Liquibase]: Operation, tries, wait:
17:49:05.613 INFO  [liquibase.Liquibase]:   analyze_table, 10, 1
17:49:05.613 INFO  [liquibase.Liquibase]:   copy_rows, 10, 0.25
17:49:05.613 INFO  [liquibase.Liquibase]:   create_triggers, 10, 1
17:49:05.613 INFO  [liquibase.Liquibase]:   drop_triggers, 10, 1
17:49:05.613 INFO  [liquibase.Liquibase]:   swap_tables, 10, 1
17:49:05.613 INFO  [liquibase.Liquibase]:   update_foreign_keys, 10, 1
17:49:05.638 INFO  [liquibase.Liquibase]: No foreign keys reference `database`.`table`; ignoring --alter-foreign-keys-method.
17:49:05.638 INFO  [liquibase.Liquibase]: Altering `database`.`table`...
17:49:05.641 INFO  [liquibase.Liquibase]: Creating new table...
17:49:05.650 INFO  [liquibase.Liquibase]: Created new table database._table_new OK.
17:49:05.650 INFO  [liquibase.Liquibase]: Altering new table...
17:49:05.665 INFO  [liquibase.Liquibase]: Altered `database`.`_table_new` OK.
17:49:05.667 INFO  [liquibase.Liquibase]: 2020-11-10T17:49:05 Creating triggers... 

And a show full processlist looks like this:

mysql> show full processlist;
+----+------+------------------+----------+---------+------+---------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Id | User | Host             | db       | Command | Time | State                           | Info                                                                                                                                                                                                                                                              |
+----+------+------------------+----------+---------+------+---------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|  3 | root | 172.30.0.1:43024 | database | Query   |    0 | starting                        | show full processlist                                                                                                                                                                                                                                             |
|  4 | root | 172.30.0.1:43028 | database | Sleep   |  351 |                                 | NULL                                                                                                                                                                                                                                                              |
|  6 | root | 172.30.0.5:33794 | database | Sleep   |  324 |                                 | NULL                                                                                                                                                                                                                                                              |
| 45 | root | 172.30.0.5:33876 | database | Query   |  324 | Waiting for table metadata lock | CREATE TRIGGER `pt_osc_database_table_del` AFTER DELETE ON `database`.`table` FOR EACH ROW DELETE IGNORE FROM `database`.`_table_new` WHERE `database`.`_table_new`.`old_column` <=> OLD.`old_column` AND `database`.`_table_new`.`sequence` <=> OLD.`sequence` |
| 46 | root | 172.30.0.5:33878 | database | Sleep   |  324 |                                 | NULL                                                                                                                                                                                                                                                              |
+----+------+------------------+----------+---------+------+---------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)

But with the stored procedure I listed above, I can now see the metadata locks held:

mysql> CALL test.procShowMetadataLockSummary();
+-------------------------------------------------------------------------------------------------------------------------------------+
| Process(es) that have the metadata lock                                                                                             |
+-------------------------------------------------------------------------------------------------------------------------------------+
| PID 4 has metadata lock on database.table with current state [] for 42 seconds and is currently running [lock table table read] |
+-------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

So I’m waiting for the next time that this occurs to try out procShowMetadataLockSummary()

:slight_smile:

Hey @erinlkolp!

My big question is what causes:

I don’t understand where that query comes from. Is it from a changeset or just some meta work liquibase is doing? Or an existing stored logic that gets kicked off when the table alter happens?

It’s logic in pt-online-schema-change that allows data to flow to both the old table and the new one. Here’s a full changeset application that will put it into perspective. Ignore the timestamps as I was playing with table locks during the run… :slight_smile:

17:49:05.429 INFO  [liquibase.Liquibase]: Executing: pt-online-schema-change --recursion-method=none --set-vars wait_timeout=31536000,innodb_lock_wait_timeout=31536000,lock_wait_timeout=31536000 --max-load Threads_running=120 --critical-load Threads_running=749 --alter-foreign-keys-method=auto --alter="ADD COLUMN new_column BLOB NULL AFTER specific_column" --host=db --port=3306 --user=root --password=*** --execute D=database,t=table
17:49:05.607 INFO  [liquibase.Liquibase]: No slaves found.  See --recursion-method if host c5c7d31fc20c has slaves.
17:49:05.607 INFO  [liquibase.Liquibase]: Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
17:49:05.613 INFO  [liquibase.Liquibase]: Operation, tries, wait:
17:49:05.613 INFO  [liquibase.Liquibase]:   analyze_table, 10, 1
17:49:05.613 INFO  [liquibase.Liquibase]:   copy_rows, 10, 0.25
17:49:05.613 INFO  [liquibase.Liquibase]:   create_triggers, 10, 1
17:49:05.613 INFO  [liquibase.Liquibase]:   drop_triggers, 10, 1
17:49:05.613 INFO  [liquibase.Liquibase]:   swap_tables, 10, 1
17:49:05.613 INFO  [liquibase.Liquibase]:   update_foreign_keys, 10, 1
17:49:05.638 INFO  [liquibase.Liquibase]: No foreign keys reference `database`.`table`; ignoring --alter-foreign-keys-method.
17:49:05.638 INFO  [liquibase.Liquibase]: Altering `database`.`table`...
17:49:05.641 INFO  [liquibase.Liquibase]: Creating new table...
17:49:05.650 INFO  [liquibase.Liquibase]: Created new table database._table_new OK.
17:49:05.650 INFO  [liquibase.Liquibase]: Altering new table...
17:49:05.665 INFO  [liquibase.Liquibase]: Altered `database`.`_table_new` OK.
17:49:05.667 INFO  [liquibase.Liquibase]: 2020-11-10T17:49:05 Creating triggers...
18:26:19.962 INFO  [liquibase.Liquibase]: 2020-11-10T18:26:19 Created triggers OK.
18:26:19.964 INFO  [liquibase.Liquibase]: 2020-11-10T18:26:19 Copying approximately 1 rows...
18:26:19.966 INFO  [liquibase.Liquibase]: 2020-11-10T18:26:19 Copied rows OK.
18:26:19.966 INFO  [liquibase.Liquibase]: 2020-11-10T18:26:19 Analyzing new table...
18:26:19.970 INFO  [liquibase.Liquibase]: 2020-11-10T18:26:19 Swapping tables...
18:26:19.981 INFO  [liquibase.Liquibase]: 2020-11-10T18:26:19 Swapped original and new tables OK.
18:26:19.981 INFO  [liquibase.Liquibase]: 2020-11-10T18:26:19 Dropping old table...
18:26:19.986 INFO  [liquibase.Liquibase]: 2020-11-10T18:26:19 Dropped old table `database`.`_table_old` OK.
18:26:19.988 INFO  [liquibase.Liquibase]: 2020-11-10T18:26:19 Dropping triggers...
18:26:19.988 INFO  [liquibase.Liquibase]: 2020-11-10T18:26:19 Dropped triggers OK.
18:26:19.988 INFO  [liquibase.Liquibase]: Successfully altered `database`.`table`.
18:26:19.994 INFO  [liquibase.Liquibase]: 
18:26:19.994 INFO  [liquibase.changelog.ChangeSet]: Columns new_column(BLOB(1024)) added to table
18:26:19.995 INFO  [liquibase.changelog.ChangeSet]: ChangeSet schema/242.xml::242-1::exampleuser ran successfully in 2234570ms
18:26:19.996 INFO  [liquibase.executor.jvm.JdbcExecutor]: INSERT INTO database.DATABASECHANGELOG (ID, AUTHOR, FILENAME, DATEEXECUTED, ORDEREXECUTED, MD5SUM, `DESCRIPTION`, COMMENTS, EXECTYPE, CONTEXTS, LABELS, LIQUIBASE, DEPLOYMENT_ID) VALUES ('242-1', 'exampleuser', 'schema/242.xml', NOW(), 30, '8:f36414ffb147b249d63509f15ae2794a', 'addColumn tableName=table', '', 'EXECUTED', NULL, NULL, '3.8.6', '5030535644')
18:26:20.003 INFO  [liquibase.lockservice.StandardLockService]: Successfully released change log lock
18:26:20.006 INFO  [liquibase.integration.commandline.Main]: Liquibase: Update has been successful.

Ah, that makes more sense @erinlkolp!

Extentions like this one for percona are mainly maintained by the extension writer (if it starts to get done by liquibase core, we would just pull the functionality in and get rid of extension).

@adangel maintains that, and there is an issue tracker here:

Maybe we can open an issue and point them to this issue for context?

Ronak