概述
在之前的版本中,MySQL已经把Nested Loop玩出了诸多花样(除普通NL外,还有BNL和BKA)。随着MySQL 8.0中统计直方图的完善,hash join终于出现在了MySQL 8.0.18版本中,用于替代性能较差的BNL。
MySQL 8.0.18中hash join的使用前提条件包括如下3点:
- 表与表之间是等值内连接并且优化器决定在连接字段上不使用索引,或者是不包含任何连接条件的笛卡尔连接;
- 在满足“条件1”的前提下,表与表之间可以包含不等值连接条件;
- 如果SQL中任意两个表之间的连接不满足“条件1”,则该SQL中的所有JOIN都不能使用hash join;
为了在执行计划中看到”hash join”关键字,需要使用”EXPLAIN FORMAT=TREE”或”EXPLAIN ANALYZE”语句,否则看到的仍然是BNL(Block Nested Loop),这点很容易产生误导。
hash join和BNL/BKA一样,都要使用到join buffer,其大小由join_buffer_size控制(默认256KB),超过后会生成临时文件在磁盘上进行操作(文件操作又进一步涉及open_files_limit参数)。
为了支持hash join,MySQL的优化器选项中增加了如下配置:
- optimizer_switch中新增了hash join开关选项:hash_join,可以在global或session级别修改该参数;
- 新增2个hint:HASH_JOIN和NO_HASH_JOIN,用于在SQL级别控制hash join行为,但如果有合适的索引可用,HASH_JOIN这个hint仍然可能被忽略(从MySQL 8.0.19开始,这两个hint被置为无效);
不适合使用hash join的场景
- 被驱动表的关联字段上存在筛选能力高的索引可用;
- 含有LIMIT子句,这时使用BNL能够提前结束表连接,而hash join在取LIMIT之前仍需要全部进行连接;
测试
环境准备:
- 数据库版本确认为8.0.18,join_buffer_size为默认的256KB大小,optimizer_switch中的hash_join=on
- 创建三张表t1、t2和t3,各包含10万行数据,t1_val和t2_val字段中都是0 ~ 99之间的数字,t3_val字段中是0 ~ 49之间的数字,这三个字段上均没有索引
1 | mysql> select version(); |
2 | +-----------+ |
3 | | version() | |
4 | +-----------+ |
5 | | 8.0.18 | |
6 | +-----------+ |
7 | 1 row in set (0.00 sec) |
8 | |
9 | mysql> show variables like 'join_buffer_size'; |
10 | +------------------+--------+ |
11 | | Variable_name | Value | |
12 | +------------------+--------+ |
13 | | join_buffer_size | 262144 | |
14 | +------------------+--------+ |
15 | 1 row in set (0.00 sec) |
16 | |
17 | mysql> show variables like 'optimizer_switch'\G |
18 | *************************** 1. row *************************** |
19 | Variable_name: optimizer_switch |
20 | Value: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on,use_invisible_indexes=off,skip_scan=on,hash_join=on |
21 | 1 row in set (0.00 sec) |
22 | |
23 | mysql> CREATE TABLE t1 (t1_id int unsigned NOT NULL auto_increment PRIMARY KEY, t1_val int unsigned, t1_str varchar(300)); |
24 | Query OK, 0 rows affected (0.23 sec) |
25 | |
26 | mysql> CREATE TABLE t2 (t2_id int unsigned NOT NULL auto_increment PRIMARY KEY, t2_val int unsigned,t2_str varchar(300)); |
27 | Query OK, 0 rows affected (0.11 sec) |
28 | |
29 | mysql> CREATE TABLE t3 (t3_id int unsigned NOT NULL auto_increment PRIMARY KEY, t3_val int unsigned,t3_str varchar(300)); |
30 | Query OK, 0 rows affected (1.76 sec) |
31 | |
32 | mysql> SET SESSION cte_max_recursion_depth = 100000; |
33 | Query OK, 0 rows affected (0.00 sec) |
34 | |
35 | mysql> INSERT INTO t1 (t1_val) WITH RECURSIVE digits(i) AS (SELECT 1 UNION ALL SELECT i+1 FROM digits WHERE i<100000) SELECT FLOOR(RAND()*100) FROM digits; |
36 | Query OK, 100000 rows affected (1.78 sec) |
37 | Records: 100000 Duplicates: 0 Warnings: 0 |
38 | |
39 | mysql> INSERT INTO t2 (t2_val) WITH RECURSIVE digits(i) AS (SELECT 1 UNION ALL SELECT i+1 FROM digits WHERE i<100000) SELECT FLOOR(RAND()*100) FROM digits; |
40 | Query OK, 100000 rows affected (1.87 sec) |
41 | Records: 100000 Duplicates: 0 Warnings: 0 |
42 | |
43 | mysql> INSERT INTO t3 (t3_val) WITH RECURSIVE digits(i) AS (SELECT 1 UNION ALL SELECT i+1 FROM digits WHERE i<100000) SELECT FLOOR(RAND()*50) FROM digits; |
44 | Query OK, 100000 rows affected (3.43 sec) |
45 | Records: 100000 Duplicates: 0 Warnings: 0 |
收集并查看统计信息直方图(非必要步骤)
1 | mysql> ANALYZE TABLE t1 UPDATE HISTOGRAM ON t1_val WITH 100 BUCKETS; |
2 | +---------+-----------+----------+---------------------------------------------------+ |
3 | | Table | Op | Msg_type | Msg_text | |
4 | +---------+-----------+----------+---------------------------------------------------+ |
5 | | test.t1 | histogram | status | Histogram statistics created for column 't1_val'. | |
6 | +---------+-----------+----------+---------------------------------------------------+ |
7 | 1 rows in set (0.32 sec) |
8 | |
9 | mysql> ANALYZE TABLE t2 UPDATE HISTOGRAM ON t2_val WITH 100 BUCKETS; |
10 | +---------+-----------+----------+---------------------------------------------------+ |
11 | | Table | Op | Msg_type | Msg_text | |
12 | +---------+-----------+----------+---------------------------------------------------+ |
13 | | test.t2 | histogram | status | Histogram statistics created for column 't2_val'. | |
14 | +---------+-----------+----------+---------------------------------------------------+ |
15 | 1 rows in set (0.21 sec) |
16 | |
17 | mysql> ANALYZE TABLE t3 UPDATE HISTOGRAM ON t3_val WITH 50 BUCKETS; |
18 | +---------+-----------+----------+---------------------------------------------------+ |
19 | | Table | Op | Msg_type | Msg_text | |
20 | +---------+-----------+----------+---------------------------------------------------+ |
21 | | test.t3 | histogram | status | Histogram statistics created for column 't3_val'. | |
22 | +---------+-----------+----------+---------------------------------------------------+ |
23 | 1 row in set (1.74 sec) |
24 | |
25 | mysql> SELECT TABLE_NAME,COLUMN_NAME,HISTOGRAM FROM INFORMATION_SCHEMA.COLUMN_STATISTICS WHERE TABLE_NAME IN('t1','t2','t3')\G |
26 | *************************** 1. row *************************** |
27 | TABLE_NAME: t1 |
28 | COLUMN_NAME: t1_val |
29 | HISTOGRAM: {"buckets": [[0, 0.00906727224855187], [1, 0.019039626119010007], [2, 0.028880331753554506], [3, 0.03942864665613481], [4, 0.04921998420221169], [5, 0.05958728278041074], [6, 0.06911532385466035], [7, 0.07954844655081622], [8, 0.08952080042127436], [9, 0.09959189046866773], [10, 0.10972880463401792], [11, 0.11979989468141128], [12, 0.1296899684044234], [13, 0.13972814639283834], [14, 0.14984860452869933], [15, 0.1599855186940495], [16, 0.17020471300684573], [17, 0.1806542917324908], [18, 0.19029752501316485], [19, 0.2006319115323855], [20, 0.21106503422854136], [21, 0.22092219589257506], [22, 0.23048314902580308], [23, 0.24020866245392314], [24, 0.2503455766192733], [25, 0.2606470510795155], [26, 0.27015863612427593], [27, 0.2805423907319642], [28, 0.29061348077935756], [29, 0.301490916271722], [30, 0.3111012374934176], [31, 0.321254607688257], [32, 0.33055226434965773], [33, 0.34004739336492895], [34, 0.3500526592943655], [35, 0.36081490258030546], [36, 0.3701948393891522], [37, 0.3806937862032649], [38, 0.39117627698788837], [39, 0.4017739599789363], [40, 0.4112690889942075], [41, 0.42153765139547134], [42, 0.43180621379673517], [43, 0.44153172722485523], [44, 0.4515863612427594], [45, 0.4617232754081096], [46, 0.4715475250131649], [47, 0.48156924697209064], [48, 0.4915416008425488], [49, 0.501843075302791], [50, 0.5123584781463929], [51, 0.5219358873091101], [52, 0.5324183780937336], [53, 0.5423578199052134], [54, 0.5519023170089522], [55, 0.5616607424960507], [56, 0.5706621906266457], [57, 0.5808155608214851], [58, 0.5907220905739864], [59, 0.6008754607688258], [60, 0.6109794628751976], [61, 0.6217910742496051], [62, 0.631335571353344], [63, 0.641225645076356], [64, 0.651263823064771], [65, 0.6606108478146394], [66, 0.6706654818325436], [67, 0.68025934702475], [68, 0.6897544760400212], [69, 0.6995293575566088], [70, 0.7096169036334914], [71, 0.7195728014744603], [72, 0.7284261453396526], [73, 0.7379870984728806], [74, 0.7483543970510796], [75, 0.7585900473933651], [76, 0.7683649289099528], [77, 0.778501843075303], [78, 0.7888197735650344], [79, 0.7988579515534494], [80, 0.8090606898367564], [81, 0.8188191153238549], [82, 0.8286762769878886], [83, 0.8388461032122172], [84, 0.8491475776724594], [85, 0.8594984202211693], [86, 0.870326487625066], [87, 0.8800520010531862], [88, 0.8895635860979465], [89, 0.8993549236440234], [90, 0.9087348604528701], [91, 0.9184932859399686], [92, 0.9281365192206427], [93, 0.9390139547130071], [94, 0.9495951816745658], [95, 0.9594687993680887], [96, 0.969934834123223], [97, 0.9799071879936812], [98, 0.9900605581885206], [99, 1.0000000000000002]], "data-type": "int", "null-values": 0.0, "collation-id": 8, "last-updated": "2019-12-20 07:10:36.951184", "sampling-rate": 0.6069398964414872, "histogram-type": "singleton", "number-of-buckets-specified": 100} |
30 | *************************** 2. row *************************** |
31 | TABLE_NAME: t2 |
32 | COLUMN_NAME: t2_val |
33 | HISTOGRAM: {"buckets": [[0, 0.010344656910461797], [1, 0.020293346092293475], [2, 0.03050601375987857], [3, 0.040520697563148605], [4, 0.050584877332497405], [5, 0.06086353962152085], [6, 0.07094421804622923], [7, 0.0807774166405438], [8, 0.09095708699740972], [9, 0.10044381382917293], [10, 0.11115144115754567], [11, 0.12060517067858971], [12, 0.13043836927290428], [13, 0.14055204500833185], [14, 0.15101219250631076], [15, 0.16181881176684104], [16, 0.17137153322004262], [17, 0.18148520895547018], [18, 0.19158238603553815], [19, 0.20095362227978425], [20, 0.21090231146161592], [21, 0.22083450198808802], [22, 0.2309151804127964], [23, 0.24089686690534726], [24, 0.2507465641550214], [25, 0.26132220224051744], [26, 0.2711718994901916], [27, 0.28136806850241713], [28, 0.29095378726633786], [29, 0.3012159509000017], [30, 0.31078517100856284], [31, 0.3203378924617644], [32, 0.33020408836679815], [33, 0.3400207883057531], [34, 0.3501509626965403], [35, 0.3595057002854268], [36, 0.36963587467621395], [37, 0.3792380920954943], [38, 0.38908778934516847], [39, 0.3987560013858871], [40, 0.4088366798105955], [41, 0.41792743891372863], [42, 0.4283545891009884], [43, 0.4382537823167413], [44, 0.4483179620860901], [45, 0.4588276055501478], [46, 0.4684463216247877], [47, 0.4783785121512598], [48, 0.48827770536701276], [49, 0.4977974295094951], [50, 0.5083070729735528], [51, 0.5186517298840146], [52, 0.5285014271336888], [53, 0.5386810974905547], [54, 0.54928973288677], [55, 0.5592714193793208], [56, 0.5693026018379505], [57, 0.579894738578806], [58, 0.5904538780089424], [59, 0.6006995429872467], [60, 0.6104337496494038], [61, 0.6205309267294717], [62, 0.629935160284437], [63, 0.6396198709805152], [64, 0.6496510534391449], [65, 0.6592697695137848], [66, 0.6691359654188186], [67, 0.6794311263632016], [68, 0.6892478263021565], [69, 0.6996089818679779], [70, 0.7095576710498096], [71, 0.7197538400620351], [72, 0.7299335104189011], [73, 0.7395687251489006], [74, 0.7493854250878556], [75, 0.7593671115804064], [76, 0.7696622725247894], [77, 0.7790335087690354], [78, 0.7890811898830247], [79, 0.7992278629291714], [80, 0.809457029252116], [81, 0.8196202009536224], [82, 0.8290904291300261], [83, 0.838907129068981], [84, 0.8489053142168914], [85, 0.860058405239973], [86, 0.8702710729075581], [87, 0.8803517513322665], [88, 0.8914388477339098], [89, 0.9011730543960669], [90, 0.910544290640313], [91, 0.92027849730247], [92, 0.9298807147217504], [93, 0.9403738595304485], [94, 0.9505205325765952], [95, 0.9599742620976393], [96, 0.9698239593473134], [97, 0.9796571579416279], [98, 0.9898038309877746], [99, 1.0]], "data-type": "int", "null-values": 0.0, "collation-id": 8, "last-updated": "2019-12-20 07:11:40.258403", "sampling-rate": 0.6069398964414872, "histogram-type": "singleton", "number-of-buckets-specified": 100} |
34 | *************************** 3. row *************************** |
35 | TABLE_NAME: t3 |
36 | COLUMN_NAME: t3_val |
37 | HISTOGRAM: {"buckets": [[0, 0.02045], [1, 0.041569999999999996], [2, 0.06140999999999999], [3, 0.08099999999999999], [4, 0.10089], [5, 0.12076999999999999], [6, 0.14067], [7, 0.16113], [8, 0.18152], [9, 0.20198999999999998], [10, 0.22208999999999998], [11, 0.24156], [12, 0.26165], [13, 0.28160999999999997], [14, 0.30163999999999996], [15, 0.32105999999999996], [16, 0.34131999999999996], [17, 0.36146999999999996], [18, 0.38072999999999996], [19, 0.40054999999999996], [20, 0.42157999999999995], [21, 0.44176999999999994], [22, 0.46115999999999996], [23, 0.48172], [24, 0.50109], [25, 0.52171], [26, 0.54217], [27, 0.5621], [28, 0.58143], [29, 0.60137], [30, 0.6214099999999999], [31, 0.6411899999999999], [32, 0.6616299999999999], [33, 0.6814199999999999], [34, 0.7012299999999999], [35, 0.7212699999999999], [36, 0.7411899999999999], [37, 0.76089], [38, 0.7811199999999999], [39, 0.8009499999999999], [40, 0.8204999999999999], [41, 0.8404599999999999], [42, 0.8609899999999999], [43, 0.8805099999999999], [44, 0.9009199999999999], [45, 0.9204399999999999], [46, 0.94136], [47, 0.96039], [48, 0.9805499999999999], [49, 0.9999999999999999]], "data-type": "int", "null-values": 0.0, "collation-id": 8, "last-updated": "2019-12-20 07:41:17.446201", "sampling-rate": 1.0, "histogram-type": "singleton", "number-of-buckets-specified": 50} |
38 | 3 rows in set (0.00 sec) |
测试如何查看hash join:EXPLAIN FORMAT=TREE
1 | ##直接使用EXPLAIN仅能看到"Block Nested Loop"关键字 |
2 | mysql> EXPLAIN |
3 | -> SELECT * FROM t1,t2,t3 |
4 | -> WHERE t2.t2_val = t1.t1_val and t1.t1_val=t3.t3_val; |
5 | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+ |
6 | | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | |
7 | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+ |
8 | | 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 100464 | 100.00 | NULL | |
9 | | 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 100464 | 10.00 | Using where; Using join buffer (Block Nested Loop) | |
10 | | 1 | SIMPLE | t3 | NULL | ALL | NULL | NULL | NULL | NULL | 100464 | 10.00 | Using where; Using join buffer (Block Nested Loop) | |
11 | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+ |
12 | 3 rows in set, 1 warning (0.00 sec) |
13 | |
14 | ##只有使用"EXPLAIN FORMAT=TREE"才能看到"Inner hash join"关键字,下面输出可以看出3张表按照t1->t2->t3的顺序进行关联,两两之间均为hash join |
15 | mysql> EXPLAIN FORMAT=TREE |
16 | -> SELECT * FROM t1,t2,t3 |
17 | -> WHERE t2.t2_val = t1.t1_val and t1.t1_val=t3.t3_val\G |
18 | *************************** 1. row *************************** |
19 | EXPLAIN: -> Inner hash join (t3.t3_val = t1.t1_val) (cost=10141380914795.79 rows=10139847189164) |
20 | -> Table scan on t3 (cost=0.52 rows=100464) |
21 | -> Hash |
22 | -> Inner hash join (t2.t2_val = t1.t1_val) (cost=1009337794.31 rows=1009301545) |
23 | -> Table scan on t2 (cost=0.27 rows=100464) |
24 | -> Hash |
25 | -> Table scan on t1 (cost=10102.65 rows=100464) |
26 | |
27 | 1 row in set (0.00 sec) |
测试hint与索引的效果:索引的优先级高于hint
1 | ##尝试让t1和t2之间进行hash join,t2和t3之间不使用hash join,但是EXPLAIN FORMAT=TREE未看到输出结果,使用EXPLAIN看到的是BNL连接方式 |
2 | mysql> EXPLAIN FORMAT=TREE |
3 | -> SELECT /*+HASH_JOIN(t1,t2) NO_HASH_JOIN(t2,t3)*/ * FROM t1,t2,t3 |
4 | -> WHERE t2.t2_val = t1.t1_val and t1.t1_val=t3.t3_val\G |
5 | *************************** 1. row *************************** |
6 | EXPLAIN: <not executable by iterator executor> |
7 | |
8 | 1 row in set, 1 warning (0.00 sec) |
9 | |
10 | mysql> EXPLAIN |
11 | -> SELECT /*+HASH_JOIN(t1,t2) NO_HASH_JOIN(t2,t3)*/ * FROM t1,t2,t3 |
12 | -> WHERE t2.t2_val = t1.t1_val and t1.t1_val=t3.t3_val; |
13 | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+ |
14 | | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | |
15 | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+ |
16 | | 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 100464 | 100.00 | NULL | |
17 | | 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 100464 | 10.00 | Using where; Using join buffer (Block Nested Loop) | |
18 | | 1 | SIMPLE | t3 | NULL | ALL | NULL | NULL | NULL | NULL | 100464 | 10.00 | Using where; Using join buffer (Block Nested Loop) | |
19 | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+ |
20 | 3 rows in set, 2 warnings (0.00 sec) |
21 | |
22 | ##在t3表的t3_val字段上创建索引,尝试用HASH_JOIN(t1,t3)强制使用hash join,发现hint被忽略,t3作为被驱动表使用了idx_t3_val索引 |
23 | mysql> create index idx_t3_val on t3(t3_val); |
24 | Query OK, 0 rows affected (0.58 sec) |
25 | Records: 0 Duplicates: 0 Warnings: 0 |
26 | |
27 | mysql> EXPLAIN FORMAT=TREE |
28 | -> SELECT /*+HASH_JOIN(t1,t3)*/ * FROM t1,t3 |
29 | -> WHERE t1.t1_val=t3.t3_val\G |
30 | *************************** 1. row *************************** |
31 | EXPLAIN: -> Nested loop inner join (cost=37990517.85 rows=210271152) |
32 | -> Filter: (t1.t1_val is not null) (cost=10102.65 rows=100464) |
33 | -> Table scan on t1 (cost=10102.65 rows=100464) |
34 | -> Index lookup on t3 using idx_t3_val (t3_val=t1.t1_val) (cost=168.75 rows=2093) |
35 | |
36 | 1 row in set (0.00 sec) |
37 | |
38 | ##测试完清理索引 |
39 | mysql> alter table t3 drop index idx_t3_val; |
40 | Query OK, 0 rows affected (0.16 sec) |
41 | Records: 0 Duplicates: 0 Warnings: 0 |
测试笛卡尔连接:支持hash join
1 | mysql> EXPLAIN FORMAT=TREE |
2 | -> SELECT * FROM t1,t2\G |
3 | *************************** 1. row *************************** |
4 | EXPLAIN: -> Inner hash join (cost=1009337794.31 rows=10093015296) |
5 | -> Table scan on t2 (cost=0.36 rows=100464) |
6 | -> Hash |
7 | -> Table scan on t1 (cost=10102.65 rows=100464) |
8 | |
9 | 1 row in set (0.00 sec) |
测试外连接:不支持hash join,退化为BNL
1 | mysql> EXPLAIN FORMAT=TREE |
2 | -> SELECT * FROM t1 JOIN t2 ON t2.t2_val = t1.t1_val |
3 | -> LEFT JOIN t3 ON t1.t1_val=t3.t3_val\G |
4 | *************************** 1. row *************************** |
5 | EXPLAIN: <not executable by iterator executor> |
6 | |
7 | 1 row in set (0.00 sec) |
8 | |
9 | mysql> EXPLAIN |
10 | -> SELECT * FROM t1 JOIN t2 ON t2.t2_val = t1.t1_val |
11 | -> LEFT JOIN t3 ON t1.t1_val=t3.t3_val; |
12 | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+ |
13 | | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | |
14 | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+ |
15 | | 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 100464 | 100.00 | NULL | |
16 | | 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 100464 | 10.00 | Using where; Using join buffer (Block Nested Loop) | |
17 | | 1 | SIMPLE | t3 | NULL | ALL | NULL | NULL | NULL | NULL | 100464 | 100.00 | Using where; Using join buffer (Block Nested Loop) | |
18 | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+ |
19 | 3 rows in set, 1 warning (0.00 sec) |
测试不等值连接:不支持hash join,退化为BNL
1 | ##SQL中只要有一个不等值连接,整个SQL中的所有连接都不会使用hash join |
2 | mysql> EXPLAIN FORMAT=TREE |
3 | -> SELECT * FROM t1 JOIN t2 ON t2.t2_val = t1.t1_val |
4 | -> JOIN t3 ON t1.t1_val > t3.t3_val\G |
5 | *************************** 1. row *************************** |
6 | EXPLAIN: <not executable by iterator executor> |
7 | |
8 | 1 row in set (0.00 sec) |
9 | |
10 | mysql> EXPLAIN |
11 | -> SELECT * FROM t1 JOIN t2 ON t2.t2_val = t1.t1_val |
12 | -> JOIN t3 ON t1.t1_val>t3.t3_val; |
13 | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+ |
14 | | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | |
15 | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+ |
16 | | 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 100464 | 100.00 | NULL | |
17 | | 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 100464 | 10.00 | Using where; Using join buffer (Block Nested Loop) | |
18 | | 1 | SIMPLE | t3 | NULL | ALL | NULL | NULL | NULL | NULL | 100464 | 33.33 | Using where; Using join buffer (Block Nested Loop) | |
19 | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+ |
20 | 3 rows in set, 1 warning (0.01 sec) |
21 | |
22 | ##在满足全部为等值内连接的前提下,可以同时包含不等值连接条件 |
23 | mysql> EXPLAIN FORMAT=TREE |
24 | -> SELECT * FROM t1 JOIN t2 ON t2.t2_val = t1.t1_val |
25 | -> JOIN t3 ON t1.t1_val = t3.t3_val and t1.t1_id > t3_id\G |
26 | *************************** 1. row *************************** |
27 | EXPLAIN: -> Inner hash join (t2.t2_val = t1.t1_val) (cost=3380795229097.44 rows=3379611112797) |
28 | -> Table scan on t2 (cost=0.52 rows=100464) |
29 | -> Hash |
30 | -> Nested loop inner join (cost=1009337794.31 rows=336400209) |
31 | -> Table scan on t1 (cost=10102.65 rows=100464) |
32 | -> Filter: ((t3.t3_val = t1.t1_val) and (t1.t1_id > t3.t3_id)) (cost=0.26 rows=3348) |
33 | -> Index range scan on t3 (re-planned for each iteration) (cost=0.26 rows=100464) |
测试非连接字段使用索引的情况:非连接字段可以使用索引,可以同时使用hash join
1 | mysql> EXPLAIN FORMAT=TREE |
2 | -> SELECT * FROM t1 JOIN t2 ON t2.t2_val = t1.t1_val |
3 | -> JOIN t3 ON t1.t1_val=t3.t3_val and t1.t1_id>5000\G |
4 | *************************** 1. row *************************** |
5 | EXPLAIN: -> Inner hash join (t3.t3_val = t1.t1_val) (cost=5070690462463.22 rows=5069923594582) |
6 | -> Table scan on t3 (cost=0.52 rows=100464) |
7 | -> Hash |
8 | -> Inner hash join (t2.t2_val = t1.t1_val) (cost=504673934.36 rows=504650772) |
9 | -> Table scan on t2 (cost=0.28 rows=100464) |
10 | -> Hash |
11 | -> Filter: (t1.t1_id > 5000) (cost=10060.40 rows=50232) |
12 | -> Index range scan on t1 using PRIMARY (cost=10060.40 rows=50232) |
测试join buffer大小对hash join性能的影响
1 | mysql> EXPLAIN FORMAT=TREE |
2 | -> SELECT count(*) FROM t1,t2 |
3 | -> WHERE t2.t2_val = t1.t1_val\G |
4 | *************************** 1. row *************************** |
5 | EXPLAIN: -> Aggregate: count(0) |
6 | -> Inner hash join (t2.t2_val = t1.t1_val) (cost=1009311796.29 rows=1009301545) |
7 | -> Table scan on t2 (cost=0.01 rows=100464) |
8 | -> Hash |
9 | -> Table scan on t1 (cost=10102.65 rows=100464) |
10 | |
11 | 1 row in set (0.11 sec) |
12 | |
13 | ##以如下形式,在会话级修改join_buffer_size为不同的值,测试SQL语句执行时长 |
14 | set join_buffer_size=1024*1024*4; |
15 | SELECT count(*) FROM t1,t2 WHERE t2.t2_val = t1.t1_val; |
16 | SELECT count(*) FROM t1,t2 WHERE t2.t2_val = t1.t1_val; |
17 | SELECT count(*) FROM t1,t2 WHERE t2.t2_val = t1.t1_val; |
18 | SELECT count(*) FROM t1,t2 WHERE t2.t2_val = t1.t1_val; |
19 | SELECT count(*) FROM t1,t2 WHERE t2.t2_val = t1.t1_val; |
测试结果:join_buffer_size不是越大越好,这应该与malloc函数选择使用brk还是mmap系统调用分配内存的方式有关
join_buffer_size | 执行时长 |
---|---|
32KB | 4.04~4.37s |
64KB | 4.22~4.43s |
128KB | 4.24~4.50s |
256KB | 4.24~4.67s |
512KB | 4.19~4.59s |
1MB | 4.17~4.74s |
2MB | 4.77~5.28s |
4MB | 5.15~6.35s |
8MB | 7.39s~7.73s |
16MB | 11.15s~11.95s |
与BNL的性能比较
1 | ##使用hash join,从上面的测试结果可以看到,执行时间在4~5秒之间 |
2 | SELECT count(*) FROM t1,t2 WHERE t2.t2_val = t1.t1_val; |
3 | |
4 | ##使用BNL,执行10分钟未返回结果,最终放弃了 |
5 | SELECT /*+NO_HASH_JOIN(t1,t2)*/ count(*) FROM t1,t2 WHERE t2.t2_val = t1.t1_val; |
Optimizer Trace视角:未发现特征信息,这里不贴trace了,太长了…
参考
How to Use Hash Joins? (文档 ID 2562434.1)