mysql SQL 优化案例 JOIN 顺序调整

Posted by w@hidva.com on October 24, 2017

线上 mysql slow log 显示有一条 sql 其执行时间几率性 2s+. sql 如下:

SELECT tableA.col10 FROM tableB INNER JOIN tableA
ON tableB.col6 >= :c1 AND tableB.col6 <= :c2 AND
tableB.col9 < :col9 AND
tableB.id = tableA.id AND tableA.col11=:col11 AND
JSON_CONTAINS(tableA.col13, :col13)
LIMIT 1

涉及表定义如下:

CREATE TABLE IF NOT EXISTS tableB(
    id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    col1 SMALLINT UNSIGNED NOT NULL,
    col2 INT UNSIGNED NOT NULL DEFAULT 0,
    col3 SMALLINT UNSIGNED NOT NULL DEFAULT 0,
    col4 INT UNSIGNED NOT NULL DEFAULT 0,
    col5 SMALLINT UNSIGNED NOT NULL DEFAULT 0,
    col6 INT UNSIGNED NOT NULL,
    col7 INT UNSIGNED NOT NULL DEFAULT 0,
    col8 INT NOT NULL,
    col9 SMALLINT UNSIGNED NOT NULL DEFAULT 0,
    col20 BINARY(16) NOT NULL UNIQUE,

    INDEX col2 (col2, col9),
    INDEX col8 (col8, col1, col9)
)
ENGINE = InnoDB,
AUTO_INCREMENT = 1024,
CHARACTER SET = utf8


CREATE TABLE IF NOT EXISTS tableA(
    id INT UNSIGNED NOT NULL PRIMARY KEY,
    col10 INT UNSIGNED NOT NULL,
    col11 VARBINARY(255) NOT NULL,
    col12 VARBINARY(255) NOT NULL DEFAULT '',
    col13 JSON NOT NULL,
    INDEX `col10` (  `col10` ),
    KEY `col12` (`col12`),
    KEY `idx1` (`col11`)
)
ENGINE = InnoDB,
CHARACTER SET = utf8

表规模大小如下:

+--------------+------------+
| table_name   | table_rows |
+--------------+------------+
| tableA       |    2403861 |
| tableB       |    4669753 |
+--------------+------------+

登录线上 sql 执行问题 sql:

mysql> EXPLAIN SELECT tableA.col10 FROM tableB INNER JOIN tableA
    -> ON tableB.col6 >= PRIVATEVAL AND tableB.col6 <= PRIVATEVAL AND
    -> tableB.col9 < PRIVATEVAL AND
    -> tableB.id = tableA.id AND tableA.col11='PRIVATEVAL' AND
    -> JSON_CONTAINS(tableA.col13, 'PRIVATEVAL')
    -> LIMIT 1;
+----+-------------+----------+------------+--------+------------------+----------+---------+--------------------+------+----------+------------------------------------+
| id | select_type | table    | partitions | type   | possible_keys    | key      | key_len | ref                | rows | filtered | Extra                              |
+----+-------------+----------+------------+--------+------------------+----------+---------+--------------------+------+----------+------------------------------------+
|  1 | SIMPLE      | tableA   | NULL       | ref    | PRIMARY,idx1     | idx1     | 257     | const              |  138 |   100.00 | Using index condition; Using where |
|  1 | SIMPLE      | tableB   | NULL       | eq_ref | PRIMARY          | PRIMARY  | 4       | nvwa_web.tableA.id |    1 |     5.00 | Using where                        |
+----+-------------+----------+------------+--------+------------------+----------+---------+--------------------+------+----------+------------------------------------+
2 rows in set, 1 warning (0.00 sec)

mysql> SELECT tableA.col10 FROM tableB INNER JOIN tableA
    -> ON tableB.col6 >= PRIVATEVAL AND tableB.col6 <= PRIVATEVAL AND
    -> tableB.col9 < PRIVATEVAL AND
    -> tableB.id = tableA.id AND tableA.col11='PRIVATEVAL' AND
    -> JSON_CONTAINS(tableA.col13, 'PRIVATEVAL')
    -> LIMIT 1;
+-----------+
| col10     |
+-----------+
| PRIVATEVAL|
+-----------+
1 row in set (0.24 sec)

经分析猜测, 问题 sql 执行步骤大致如下:

  1. 从表 tableA 中选择满足条件的记录填充到 join buffer 中. 此时应用的条件: tableA.col11='...' AND JSON_CONTAINS(tableA.col13, '...').
  2. 根据 join buffer 中的记录, 从 tableB 表选择符合条件的记录. 由于 limit 1 的存在, 若找到记录则终止执行; 否则继续执行步骤 1.

所以也即 JSON_CONTAINS() 可能会执行多次, 而且 JSON_CONTAINS() 作为 CPU 密集型操作, 当并发增大导致 CPU 资源紧缺时, 就会增加 JSON_CONTAINS() 的执行耗时, 从而增加了问题 SQL 的执行耗时.

优化后的 SQL 如下:

SELECT STRAIGHT_JOIN A.col10
FROM tableA A INNER JOIN tableB B INNER JOIN tableA C
ON A.id=B.id AND A.id=C.id AND A.col11=:col11 AND JSON_CONTAINS(C.col13, :step) AND
B.col6 >= :c1 AND B.col6 <= :c2 AND B.col9 < :col9
LIMIT 1
-- 由于 limit 1 的存在, 并不会扫描 C 太多记录, 而且 JSON_CONTAINS() 的执行次数也大幅降低.

其运行效果如下:

mysql> EXPLAIN SELECT STRAIGHT_JOIN A.col10
    -> FROM tableA A INNER JOIN tableB B INNER JOIN tableA C
    -> ON A.id=B.id AND A.id=C.id AND A.col11='PRIVATEVAL' AND JSON_CONTAINS(C.col13, 'PRIVATEVAL') AND
    -> B.col6 >= PRIVATEVAL AND B.col6 <= PRIVATEVAL AND B.col9 < PRIVATEVAL
    -> LIMIT 1;
+----+-------------+-------+------------+--------+------------------+----------+---------+---------------+------+----------+-----------------------+
| id | select_type | table | partitions | type   | possible_keys    | key      | key_len | ref           | rows | filtered | Extra                 |
+----+-------------+-------+------------+--------+------------------+----------+---------+---------------+------+----------+-----------------------+
|  1 | SIMPLE      | A     | NULL       | ref    | PRIMARY,idx1     | idx1     | 257     | const         |  138 |   100.00 | Using index condition |
|  1 | SIMPLE      | B     | NULL       | eq_ref | PRIMARY          | PRIMARY  | 4       | nvwa_web.A.id |    1 |     5.00 | Using where           |
|  1 | SIMPLE      | C     | NULL       | eq_ref | PRIMARY          | PRIMARY  | 4       | nvwa_web.A.id |    1 |   100.00 | Using where           |
+----+-------------+-------+------------+--------+------------------+----------+---------+---------------+------+----------+-----------------------+
3 rows in set, 1 warning (0.00 sec)

mysql> SELECT STRAIGHT_JOIN A.col10
    -> FROM tableA A INNER JOIN tableB B INNER JOIN tableA C
    -> ON A.id=B.id AND A.id=C.id AND A.col11='PRIVATEVAL' AND JSON_CONTAINS(C.col13, 'PRIVATEVAL') AND
    -> B.col6 >= PRIVATEVAL AND B.col6 <= PRIVATEVAL AND B.col9 < PRIVATEVAL
    -> LIMIT 1;
+-----------+
| col10     |
+-----------+
| PRIVATEVAL|
+-----------+
1 row in set (0.00 sec)

所以 mysql 优化器在执行带有 limit 的 join 语句时, 是不是应该把 join buffer 设置为 min(配置的 join buffer, limit * row bytes).