Hive--参数优化、Map、Reduce Task个数优化

最新推荐文章于 2025-06-04 13:27:52 发布

原创

最新推荐文章于 2025-06-04 13:27:52 发布 · 1.9k 阅读

5 ·

CC 4.0 BY-SA版权

文章标签：

#hive #大数据

Hive--参数优化、Map、Reduce Task个数优化

1 Hive--参数优化

1.1 hive.fetch.task.conversion

1.2 hive.exec.mode.local.auto

1.3 hive.mapred.mode

1.4 hive.mapred.reduce.tasks.speculative.execution

1.5 hive.optimize.cp

1.6 hive.optimize.ppd

2 MapReduce 阶段Map、Reduce Task个数优化

2.1 Map Task 个数优化

2.2 Reduce Task 个数优化

Hive中的执行引擎目前支持：MapReduce、Spark、Tez
本文设定的执行引擎为MapReduce

1 Hive--参数优化

Hive官网--参数

1.1 hive.fetch.task.conversion

Default Value: minimal in Hive 0.10.0 through 0.13.1, more in Hive 0.14.0 and later
Added In: Hive 0.10.0 with HIVE-2925; default changed in Hive 0.14.0 with HIVE-7397
Some select queries can be converted to a single FETCH task, minimizing latency. Currently the query should be single sourced not having any subquery and should not have any aggregations or distincts (which incur RS – ReduceSinkOperator, requiring a MapReduce task), lateral views and joins.

Supported values are none, minimal and more.

0. none:  Disable hive.fetch.task.conversion (value added in Hive 0.14.0 with HIVE-8389)
1. minimal:  SELECT *, FILTER on partition columns (WHERE and HAVING clauses), LIMIT only
2. more:  SELECT, FILTER, LIMIT only (including TABLESAMPLE, virtual columns)

"more" can take any kind of expressions in the SELECT clause, including UDFs.
(UDTFs and lateral views are not yet supported – see HIVE-5718.)

建议使用more模式，增加SQL执行速度

1.1.1 none模式

none:禁用这个参数，SQL无论什么样子都会走MapReduce

hive> set hive.fetch.task.conversion;
hive.fetch.task.conversion=none

hive> select * from bigdata.emp;
Query ID = work_20201216094245_d44ea4d3-0a5b-4302-93dd-4ef9a5252517
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1608016084001_0020, Tracking URL = http://bigdatatest02:8088/proxy/application_1608016084001_0020/
Kill Command = /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop job  -kill job_1608016084001_0020
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-16 09:43:02,342 Stage-1 map = 0%,  reduce = 0%
2020-12-16 09:43:10,641 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.28 sec
MapReduce Total cumulative CPU time: 2 seconds 280 msec
Ended Job = job_1608016084001_0020
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.28 sec   HDFS Read: 4413 HDFS Write: 451 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 280 msec
OK
7369	SMITH	20
7499	ALLEN	30
7521	WARD	30
7566	JONES	20
7654	MARTIN	30
7698	BLAKE	30
7782	CLARK	10
7788	SCOTT	20
7839	KING	10
7844	TURNER	30
7876	ADAMS	20
7900	JAMES	30
7902	FORD	20
7934	MILLER	10
Time taken: 27.002 seconds, Fetched: 14 row(s)

1.1.2 minimal模式

minimal:正常扫描全表，不会触发MapReduce,如果进行FILTER会触发MapRedcue
分区表的分区字段FILTER不会触发MapReduce
正常表

hive> set hive.fetch.task.conversion;
hive.fetch.task.conversion=minimal

hive> select * from bigdata.emp where dept_no = '20';
Query ID = work_20201216094750_3df492b8-bbd8-4e41-b378-bba5fe1b3dc7
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1608016084001_0022, Tracking URL = http://bigdatatest02:8088/proxy/application_1608016084001_0022/
Kill Command = /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop job  -kill job_1608016084001_0022
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-16 09:48:07,799 Stage-1 map = 0%,  reduce = 0%
2020-12-16 09:48:17,119 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.33 sec
MapReduce Total cumulative CPU time: 4 seconds 330 msec
Ended Job = job_1608016084001_0022
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 4.33 sec   HDFS Read: 4952 HDFS Write: 216 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 330 msec
OK
7369	SMITH	20
7566	JONES	20
7788	SCOTT	20
7876	ADAMS	20
7902	FORD	20
Time taken: 27.728 seconds, Fetched: 5 row(s)
hive> select * from bigdata.emp;
OK
7369	SMITH	20
7499	ALLEN	30
7521	WARD	30
7566	JONES	20
7654	MARTIN	30
7698	BLAKE	30
7782	CLARK	10
7788	SCOTT	20
7839	KING	10
7844	TURNER	30
7876	ADAMS	20
7900	JAMES	30
7902	FORD	20
7934	MILLER	10
Time taken: 0.144 seconds, Fetched: 14 row(s)