Hive--参数优化、Map、Reduce Task个数优化
目录
1.1 hive.fetch.task.conversion
1.4 hive.mapred.reduce.tasks.speculative.execution
2 MapReduce 阶段Map、Reduce Task个数优化
- Hive中的执行引擎目前支持:MapReduce、Spark、Tez
- 本文设定的执行引擎为MapReduce
1 Hive--参数优化
1.1 hive.fetch.task.conversion
Default Value: minimal in Hive 0.10.0 through 0.13.1, more in Hive 0.14.0 and later
Added In: Hive 0.10.0 with HIVE-2925; default changed in Hive 0.14.0 with HIVE-7397
Some select queries can be converted to a single FETCH task, minimizing latency. Currently the query should be single sourced not having any subquery and should not have any aggregations or distincts (which incur RS – ReduceSinkOperator, requiring a MapReduce task), lateral views and joins.
Supported values are none, minimal and more.
0. none: Disable hive.fetch.task.conversion (value added in Hive 0.14.0 with HIVE-8389)
1. minimal: SELECT *, FILTER on partition columns (WHERE and HAVING clauses), LIMIT only
2. more: SELECT, FILTER, LIMIT only (including TABLESAMPLE, virtual columns)
"more" can take any kind of expressions in the SELECT clause, including UDFs.
(UDTFs and lateral views are not yet supported – see HIVE-5718.)
- 建议使用more模式,增加SQL执行速度
1.1.1 none模式
- none:禁用这个参数,SQL无论什么样子都会走MapReduce
hive> set hive.fetch.task.conversion;
hive.fetch.task.conversion=none
hive> select * from bigdata.emp;
Query ID = work_20201216094245_d44ea4d3-0a5b-4302-93dd-4ef9a5252517
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1608016084001_0020, Tracking URL = http://bigdatatest02:8088/proxy/application_1608016084001_0020/
Kill Command = /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop job -kill job_1608016084001_0020
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-16 09:43:02,342 Stage-1 map = 0%, reduce = 0%
2020-12-16 09:43:10,641 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.28 sec
MapReduce Total cumulative CPU time: 2 seconds 280 msec
Ended Job = job_1608016084001_0020
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.28 sec HDFS Read: 4413 HDFS Write: 451 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 280 msec
OK
7369 SMITH 20
7499 ALLEN 30
7521 WARD 30
7566 JONES 20
7654 MARTIN 30
7698 BLAKE 30
7782 CLARK 10
7788 SCOTT 20
7839 KING 10
7844 TURNER 30
7876 ADAMS 20
7900 JAMES 30
7902 FORD 20
7934 MILLER 10
Time taken: 27.002 seconds, Fetched: 14 row(s)
1.1.2 minimal模式
- minimal:正常扫描全表,不会触发MapReduce,如果进行FILTER会触发MapRedcue
- 分区表的分区字段FILTER不会触发MapReduce
- 正常表
hive> set hive.fetch.task.conversion;
hive.fetch.task.conversion=minimal
hive> select * from bigdata.emp where dept_no = '20';
Query ID = work_20201216094750_3df492b8-bbd8-4e41-b378-bba5fe1b3dc7
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1608016084001_0022, Tracking URL = http://bigdatatest02:8088/proxy/application_1608016084001_0022/
Kill Command = /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop job -kill job_1608016084001_0022
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-16 09:48:07,799 Stage-1 map = 0%, reduce = 0%
2020-12-16 09:48:17,119 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.33 sec
MapReduce Total cumulative CPU time: 4 seconds 330 msec
Ended Job = job_1608016084001_0022
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 4.33 sec HDFS Read: 4952 HDFS Write: 216 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 330 msec
OK
7369 SMITH 20
7566 JONES 20
7788 SCOTT 20
7876 ADAMS 20
7902 FORD 20
Time taken: 27.728 seconds, Fetched: 5 row(s)
hive> select * from bigdata.emp;
OK
7369 SMITH 20
7499 ALLEN 30
7521 WARD 30
7566 JONES 20
7654 MARTIN 30
7698 BLAKE 30
7782 CLARK 10
7788 SCOTT 20
7839 KING 10
7844 TURNER 30
7876 ADAMS 20
7900 JAMES 30
7902 FORD 20
7934 MILLER 10
Time taken: 0.144 seconds, Fetched: 14 row(s)
- 分区表
- 创建分区表并加载数据
CREATE TABLE IF NOT EXISTS bigdata.emp_