你真的懂Hive窗口函数吗,如何开窗聚合?

本文详细解析了Hive的窗口函数,包括FIRST_VALUE、LAST_VALUE、LEAD、LAG等,以及OVER子句的应用,涵盖标准聚合函数和分析函数,如COUNT、SUM、RANK等,并介绍了窗口范围的设置和自定义窗口的使用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

目录

1 窗口函数 Windowing functions

FIRST_VALUE(col, bool DEFAULT)

LAST_VALUE(col, bool DEFAULT)

LEAD(col, n, DEFAULT)

LAG(col, n, DEFAULT)

2 OVER详解 The OVER clause

FUNCTION(expr) OVER([PARTITION BY statement] [ORDER BY statement] [window clause])

2.1 标准聚合函数

COUNT(expr) OVER()

SUM(expr) OVER()

MIN(expr) OVER()

MAX(expr) OVER()

AVG(expr) OVER()

2.2 分析函数 Analytics functions

RANK() OVER()

ROW_NUMBER() OVER()

DENSE_RANK() OVER()

CUME_DIST() OVER()

PERCENT_RANK() OVER()

NTILE(INTEGER x) OVER()

2.3 OVER子句也支持聚合函数

2.4 window clause 的另一种写法


1 窗口函数 Windowing functions

FIRST_VALUE(col, bool DEFAULT)

返回分组窗口内第一行col的值,DEFAULT默认为false,如果指定为true,则跳过NULL后再取值

WITH tmp AS
(
  SELECT 1 AS group_id, 'a' AS col 
  UNION ALL SELECT 1 AS group_id,  'b' AS col 
  UNION ALL SELECT 1 AS group_id,  'c' AS col 
  UNION ALL SELECT 2 AS group_id,  NULL AS col 
  UNION ALL SELECT 2 AS group_id,  'e' AS col
)
SELECT group_id,
       col,
       FIRST_VALUE(col) over(partition by group_id order by col) as col_new
FROM tmp;
group_id col col_new
1 a a
1 b a
1 c a
2 NULL NULL
2 e NULL
WITH tmp AS
(
  SELECT 1 AS group_id, NULL AS col 
  UNION ALL SELECT 1 AS group_id,  'b' AS col 
  UNION ALL SELECT 1 AS group_id,  'c' AS col 
  UNION ALL SELECT 2 AS group_id,  NULL AS col 
  UNION ALL SELECT 2 AS group_id,  'e' AS col
)
SELECT group_id,
       col,
       FIRST_VALUE(col, true) over(partition by group_id order by col) as col_new
FROM tmp;
group_id col col_new
1 NULL NULL
1 b b
1 c b
2 NULL NULL
2 e e

LAST_VALUE(col, bool DEFAULT)

返回分组窗口内最后一行col的值,DEFAULT默认为false,如果指定为true,则跳过NULL后再取值

WITH tmp AS
(
  SELECT 1 AS group_id, 'a' AS col 
  UNION ALL SELECT 1 AS group_id,  NULL AS col 
  UNION ALL SELECT 1 AS group_id,  'c' AS col 
  UNION ALL SELECT 2 AS group_id,  'd' AS col 
  UNION ALL SELECT 2 AS group_id,  'e' AS col
)
SELECT group_id,
       col,
       LAST_VALUE(col) over(partition by group_id order by col desc) as col_new
FROM tmp;
group_id col col_new
1 c c
1 a a
1 NULL NULL
2 e e
2 d d
WITH tmp AS
(
  SELECT 1 AS group_id, 'a' AS col 
  UNION ALL SELECT 1 AS group_id,  NULL AS col 
  UNION ALL SELECT 1 AS group_id,  'c' AS col 
  UNION ALL SELECT 2 AS group_id,  'd' AS col 
  UNION ALL SELECT 2 AS group_id,  'e' AS col
)
SELECT group_id,
       col,
       LAST_VALUE(col, true) over(order by group_id,col desc rows between 1 preceding and 1 following) as col_new
FROM tmp;
group_id col col_new
1 c a
1 a a
1 NULL e
2 e d
2 d d

LEAD(col, n, DEFAULT)

返回分组窗口内往下第n行col的值,n默认为1,往下第n没有时返回DEFAULT(DEFAULT默认为NULL)

WITH tmp AS
(
  SELECT 1 AS group_id, 'a' AS col 
  UNION ALL SELECT 1 AS group_id,  'b' AS col 
  UNION ALL SELECT 1 AS group_id,  'c' AS col 
  UNION ALL SELECT 2 AS group_id,  'd' AS col 
  UNION ALL SELECT 2 AS group_id,  'e' AS col
)
SELECT group_id,
       col,
       LEAD(col) over(partition by group_id order by col) as col_new
FROM tmp;

等同于:

WITH tmp AS
(
  SELECT 1 AS group_id, 'a' AS col 
  UNION ALL SELECT 1 AS group_id,  'b' AS col 
  UNION ALL SELECT 1 AS group_id,  'c' AS col 
  UNION ALL SELECT 2 AS group_id,  'd' AS col 
  UNION ALL SELECT 2 AS group_id,  'e' AS col
)
SELECT group_id,
       col,
       LAST_VALUE(col) over(partition by group_id order by col rows between 1 FOLLOWING and 1 FOLLOWING) as col_new
FROM tmp;

返回结果都是:

group_id col col_new
1 a b
1 b c
1 c NULL
2 d e
2 e NULL
WITH tmp AS
(
  SELECT 1 AS group_id, 'a' AS col 
  UNION ALL SELECT 1 AS group_id,  'b' AS col 
  UNION ALL SELECT 1 AS group_id,  'c' AS col 
  UNION ALL SELECT 2 AS group_id,  'd' AS col 
  UNION ALL SELECT 2 AS group_id,  'e' AS col
)
SELECT group_id,
       col,
       LEAD(col, 2, 'z') over(partition by group_id order by col) as col_new
FROM tmp;

返回结果:

group_id col col_new
1 a c
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值