共计 9646 个字符,预计需要花费 25 分钟才能阅读完成。
在上篇文章《MySQL 表结构变更,不可不知的 Metadata Lock》中,我们介绍了 MDL 引入的背景,及基本概念,从“道”的层面知道了什么是 MDL。下面就从“术”的层面看看如何定位 MDL 的相关问题。
在 MySQL 5.7 中,针对 MDL,引入了一张新表 performance_schema.metadata_locks,该表可对外展示 MDL 的相关信息,包括其作用对象,类型及持有等待情况。
开启 MDL 的 instrument
但是相关 instrument 并没有开启(MySQL 8.0 是默认开启的),其可通过如下两种方式开启,
临时生效
修改 performance_schema.setup_instrume nts 表,但实例重启后,又会恢复为默认值。
UPDATE performance_schema.setup_instruments SET ENABLED = ‘YES’, TIMED = ‘YES’
WHERE NAME = ‘wait/lock/metadata/sql/mdl’;
performance-schema-instrument=’wait/lock/metadata/sql/mdl=ON’
测试场景
下面结合一个简单的 Demo,来看看在 MySQL 5.7 中如何定位 DDL 操作的阻塞问题。
这里,重点关注 lock_status,”PENDING” 代表线程在等待 MDL,而 ”GRANTED” 则代表线程持有 MDL。
如何找出引起阻塞的会话
结合 owner_thread_id,可以可到,是 29 号线程在等待 27 号线程的 MDL,此时,可 kill 掉 52 号线程。
但需要注意的是,owner_thread_id 给出的只是线程 ID,并不是 show processlist 中的 ID。如果要查找线程对应的 processlist id,需查询 performance_schema.threads 表。
session3> select * from performance_schema.threads where thread_id in (27,29)\G
*************************** 1. row ***************************
THREAD_ID: 27
NAME: thread/sql/one_connection
TYPE: FOREGROUND
PROCESSLIST_ID: 2
PROCESSLIST_USER: root
PROCESSLIST_HOST: localhost
PROCESSLIST_DB: NULL
PROCESSLIST_COMMAND: Sleep
PROCESSLIST_TIME: 214
PROCESSLIST_STATE: NULL
PROCESSLIST_INFO: NULL
PARENT_THREAD_ID: 1
ROLE: NULL
INSTRUMENTED: YES
HISTORY: YES
CONNECTION_TYPE: Socket
THREAD_OS_ID: 9800
*************************** 2. row ***************************
THREAD_ID: 29
NAME: thread/sql/one_connection
TYPE: FOREGROUND
PROCESSLIST_ID: 4
PROCESSLIST_USER: root
PROCESSLIST_HOST: localhost
PROCESSLIST_DB: NULL
PROCESSLIST_COMMAND: Query
PROCESSLIST_TIME: 172
PROCESSLIST_STATE: Waiting for table metadata lock
PROCESSLIST_INFO: alter table slowtech.t1 add c1 int
PARENT_THREAD_ID: 1
ROLE: NULL
INSTRUMENTED: YES
HISTORY: YES
CONNECTION_TYPE: Socket
THREAD_OS_ID: 9907
2 rows in set (0.00 sec)
将这两张表结合,借鉴 sys.innodb_lock _waits 的输出,实际上我们也可以直观地呈现 MDL 的等待关系。
SELECT
a.OBJECT_SCHEMA AS locked_schema,
a.OBJECT_NAME AS locked_table,
“Metadata Lock” AS locked_type,
c.PROCESSLIST_ID AS waiting_processlist_id,
c.PROCESSLIST_TIME AS waiting_age,
c.PROCESSLIST_INFO AS waiting_query,
c.PROCESSLIST_STATE AS waiting_state,
d.PROCESSLIST_ID AS blocking_processlist_id,
d.PROCESSLIST_TIME AS blocking_age,
d.PROCESSLIST_INFO AS blocking_query,
concat(‘KILL ‘, d.PROCESSLIST_ID) AS sql_kill_blocking_connection
FROM
performance_schema.metadata_locks a
JOIN performance_schema.metadata_locks b ON a.OBJECT_SCHEMA = b.OBJECT_SCHEMA
AND a.OBJECT_NAME = b.OBJECT_NAME
AND a.lock_status = ‘PENDING’
AND b.lock_status = ‘GRANTED’
AND a.OWNER_THREAD_ID <> b.OWNER_THREAD_ID
AND a.lock_type = ‘EXCLUSIVE’
JOIN performance_schema.threads c ON a.OWNER_THREAD_ID = c.THREAD_ID
JOIN performance_schema.threads d ON b.OWNER_THREAD_ID = d.THREAD_ID\G
*************************** 1. row ***************************
locked_schema: slowtech
locked_table: t1
locked_type: Metadata Lock
waiting_processlist_id: 4
waiting_age: 259
waiting_query: alter table slowtech.t1 add c1 int
waiting_state: Waiting for table metadata lock
blocking_processlist_id: 2
blocking_age: 301
blocking_query: NULL
sql_kill_blocking_connection: KILL 2
1 row in set (0.00 sec)
输出一目了然,DDL 操作如果要获得 MDL,执行 kill 2 即可。
官方的 sys.schematablelock_waits
实际上,MySQL 5.7 在 sys 库中也集成了类似功能,同样的场景,其输出如下,
具体分析下官方的输出,
只有一个 alter table 操作,却产生了两条记录,而且两条记录的 kill 对象竟然还不一样,对表结构不熟悉及不仔细看记录内容的话,难免会 kill 错对象。
不仅如此,如果有 N 个查询被 DDL 操作堵塞,则会产生 N * 2 条记录。在阻塞操作较多的情况下,这 N * 2 条记录完全是个噪音。
而之前的 SQL,无论有多少操作被阻塞,一个 alter table 操作,就只会输出一条记录。
如何查看阻塞会话已经执行过的操作
但上面这个 SQL 也有遗憾,其 blocking_query 为 NULL,而在会话 1 中,其明明已经执行了三个 SQL。
这个与 performance_schema.threads(类似于 show processlist)有关,其只会输出当前正在运行的 SQL,对于已经执行过的,实际上是没办法看到。
但在线上,kill 是一个需要谨慎的操作,毕竟你很难知道 kill 的是不是业务关键操作?又或者,是个批量 update 操作?那么,有没有办法抓到该事务之前的操作呢?
答案,有。
即 Performance Schema 中记录 Statement Event(操作事件)的表,具体包括 events_statements_current,events_statements_history,events_statements_history_long,prepared_statements_instances。
常用的是前面三个。
三者的表结构完全一致,其中,events_statements_history 又包含了 events_statements_current 的操作,所以我们这里会使用 events_statements_history。
终极 SQL 如下,
SELECT
locked_schema,
locked_table,
locked_type,
waiting_processlist_id,
waiting_age,
waiting_query,
waiting_state,
blocking_processlist_id,
blocking_age,
substring_index(sql_text,”transaction_begin;” ,-1) AS blocking_query,
sql_kill_blocking_connection
FROM
(
SELECT
b.OWNER_THREAD_ID AS granted_thread_id,
a.OBJECT_SCHEMA AS locked_schema,
a.OBJECT_NAME AS locked_table,
“Metadata Lock” AS locked_type,
c.PROCESSLIST_ID AS waiting_processlist_id,
c.PROCESSLIST_TIME AS waiting_age,
c.PROCESSLIST_INFO AS waiting_query,
c.PROCESSLIST_STATE AS waiting_state,
d.PROCESSLIST_ID AS blocking_processlist_id,
d.PROCESSLIST_TIME AS blocking_age,
d.PROCESSLIST_INFO AS blocking_query,
concat(‘KILL ‘, d.PROCESSLIST_ID) AS sql_kill_blocking_connection
FROM
performance_schema.metadata_locks a
JOIN performance_schema.metadata_locks b ON a.OBJECT_SCHEMA = b.OBJECT_SCHEMA
AND a.OBJECT_NAME = b.OBJECT_NAME AND a.lock_status = ‘PENDING’
AND b.lock_status = ‘GRANTED’ AND a.OWNER_THREAD_ID <> b.OWNER_THREAD_ID
AND a.lock_type = ‘EXCLUSIVE’ JOIN performance_schema.threads c ON a.OWNER_THREAD_ID = c.THREAD_ID
JOIN performance_schema.threads d ON b.OWNER_THREAD_ID = d.THREAD_ID
) t1,
(
SELECT
thread_id,
group_concat(CASE WHEN EVENT_NAME = ‘statement/sql/begin’ THEN “transaction_begin” ELSE sql_text END ORDER BY event_id SEPARATOR “;”) AS sql_text
FROM
performance_schema.events_statements_history
GROUP BY thread_id
) t2
WHERE
t1.granted_thread_id = t2.thread_id \G
*************************** 1. row ***************************
locked_schema: slowtech
locked_table: t1
locked_type: Metadata Lock
waiting_processlist_id: 4
waiting_age: 294
waiting_query: alter table slowtech.t1 add c1 int
waiting_state: Waiting for table metadata lock
blocking_processlist_id: 2
blocking_age: 336
blocking_query: delete from slowtech.t1 where id=2;select * from slowtech.t1;update slowtech.t1 set name=’c’ where id=1
sql_kill_blocking_connection: KILL 21 row in set, 1 warning (0.00 sec)
从上面的输出可以看到,blocking_query 中包含了会话 1 中当前事务的所有操作,按执行的先后顺序输出。
需要注意的是,默认情况下,events_statements_history 只会保留每个线程最近的 10 个操作,如果事务中进行的操作较多,实际上也是没办法抓全的。
Anyway, it is better than nothing!
: