Hive--参数优化、Map、Reduce Task个数优化

             Hive--参数优化、Map、Reduce Task个数优化

目录

1 Hive--参数优化

1.1 hive.fetch.task.conversion

1.2 hive.exec.mode.local.auto

1.3 hive.mapred.mode

1.4 hive.mapred.reduce.tasks.speculative.execution

1.5 hive.optimize.cp

1.6 hive.optimize.ppd

 

2 MapReduce 阶段Map、Reduce Task个数优化

2.1 Map Task 个数优化

2.2 Reduce Task 个数优化


  • Hive中的执行引擎目前支持:MapReduce、Spark、Tez
  • 本文设定的执行引擎为MapReduce

 

1 Hive--参数优化

Hive官网--参数

1.1 hive.fetch.task.conversion

Default Value: minimal in Hive 0.10.0 through 0.13.1, more in Hive 0.14.0 and later
Added In: Hive 0.10.0 with HIVE-2925; default changed in Hive 0.14.0 with HIVE-7397
Some select queries can be converted to a single FETCH task, minimizing latency. Currently the query should be single sourced not having any subquery and should not have any aggregations or distincts (which incur RS – ReduceSinkOperator, requiring a MapReduce task), lateral views and joins.

Supported values are none, minimal and more.

0. none:  Disable hive.fetch.task.conversion (value added in Hive 0.14.0 with HIVE-8389)
1. minimal:  SELECT *, FILTER on partition columns (WHERE and HAVING clauses), LIMIT only
2. more:  SELECT, FILTER, LIMIT only (including TABLESAMPLE, virtual columns)

"more" can take any kind of expressions in the SELECT clause, including UDFs.
(UDTFs and lateral views are not yet supported – see HIVE-5718.)
  • 建议使用more模式,增加SQL执行速度

1.1.1 none模式

  • none:禁用这个参数,SQL无论什么样子都会走MapReduce
hive> set hive.fetch.task.conversion;
hive.fetch.task.conversion=none

hive> select * from bigdata.emp;
Query ID = work_20201216094245_d44ea4d3-0a5b-4302-93dd-4ef9a5252517
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1608016084001_0020, Tracking URL = http://bigdatatest02:8088/proxy/application_1608016084001_0020/
Kill Command = /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop job  -kill job_1608016084001_0020
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-16 09:43:02,342 Stage-1 map = 0%,  reduce = 0%
2020-12-16 09:43:10,641 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.28 sec
MapReduce Total cumulative CPU time: 2 seconds 280 msec
Ended Job = job_1608016084001_0020
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.28 sec   HDFS Read: 4413 HDFS Write: 451 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 280 msec
OK
7369	SMITH	20
7499	ALLEN	30
7521	WARD	30
7566	JONES	20
7654	MARTIN	30
7698	BLAKE	30
7782	CLARK	10
7788	SCOTT	20
7839	KING	10
7844	TURNER	30
7876	ADAMS	20
7900	JAMES	30
7902	FORD	20
7934	MILLER	10
Time taken: 27.002 seconds, Fetched: 14 row(s)

1.1.2 minimal模式

  • minimal:正常扫描全表,不会触发MapReduce,如果进行FILTER会触发MapRedcue
  • 分区表的分区字段FILTER不会触发MapReduce
  • 正常表
hive> set hive.fetch.task.conversion;
hive.fetch.task.conversion=minimal

hive> select * from bigdata.emp where dept_no = '20';
Query ID = work_20201216094750_3df492b8-bbd8-4e41-b378-bba5fe1b3dc7
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1608016084001_0022, Tracking URL = http://bigdatatest02:8088/proxy/application_1608016084001_0022/
Kill Command = /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop job  -kill job_1608016084001_0022
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-16 09:48:07,799 Stage-1 map = 0%,  reduce = 0%
2020-12-16 09:48:17,119 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.33 sec
MapReduce Total cumulative CPU time: 4 seconds 330 msec
Ended Job = job_1608016084001_0022
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 4.33 sec   HDFS Read: 4952 HDFS Write: 216 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 330 msec
OK
7369	SMITH	20
7566	JONES	20
7788	SCOTT	20
7876	ADAMS	20
7902	FORD	20
Time taken: 27.728 seconds, Fetched: 5 row(s)
hive> select * from bigdata.emp;
OK
7369	SMITH	20
7499	ALLEN	30
7521	WARD	30
7566	JONES	20
7654	MARTIN	30
7698	BLAKE	30
7782	CLARK	10
7788	SCOTT	20
7839	KING	10
7844	TURNER	30
7876	ADAMS	20
7900	JAMES	30
7902	FORD	20
7934	MILLER	10
Time taken: 0.144 seconds, Fetched: 14 row(s)
  • 分区表
  • 创建分区表并加载数据
CREATE TABLE IF NOT EXISTS bigdata.emp_
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值