Hive中explode、lateral view以及行列转换的使用

最新推荐文章于 2024-12-09 09:14:07 发布

奈何@

最新推荐文章于 2024-12-09 09:14:07 发布

阅读量476

点赞数

CC 4.0 BY-SA版权

分类专栏： # Hive函数文章标签： hive函数

本文链接：https://blog.csdn.net/sinat_26594945/article/details/116145245

Hive函数专栏收录该内容

5 篇文章

订阅专栏

=====================================lateral view==================================
分类：视图
功能：配合UDTF来使用,把某一行数据拆分成多行数据
   与UDTF直接使用的区别：
       很多的UDTF不能将结果与源表进行关联，使用lateral view
       可以将UDTF拆分的单个字段数据与原始表数据关联上
使用方式：
   tabelA lateral view UDTF(xxx) 视图名 as a,b,c
案例：
--准备数据:vim /export/datas/lateral.txt
1   http://facebook.com/path/p1.php?query=1
2   http://www.baidu.com/news/index.jsp?uuid=frank
3   http://www.jd.com/index?source=baidu

--创建表
create table tb_url(
id int,
url string
) row format delimited fields terminated by '\t';

--加载数据
load data local inpath '/export/datas/lateral.txt' into table tb_url;

--使用UDTF解析
SELECT parse_url_tuple(url, 'HOST') from tb_url;

--使用UDTF+lateral view
select a.id,b.* from tb_url a lateral view parse_url_tuple(url, 'HOST') b as host;

--对比
SELECT id,parse_url_tuple(url, 'HOST') from tb_url;--失败，UDTF函数不能与字段连用



=====================================explode==================================
分类：UDTF
功能：函数可以将一个array或者map展开
   explode(array)：
       将array列表里的每个元素生成一行
   explode(map)：
       每一对元素作为一行，key为一列，value为一列
使用方式：
   1-直接使用
   2-与lateral view连用

案例：
   1-实现wordcount【explode(array)】
       select explode(split(word," ")) from wc1;
   2-将兴趣爱好展开【explode(map)】
       select explode(hobby) from complex_map;
   3-与侧视图连用
       select a.name,b.* from complex_map a lateral view explode(hobby) b as hobby,deep;


=====================================行列转换==================================
行转列：
   案例1：多行转多列
       -》创建数据：vim /export/datas/r2c1.txt
a   c   1
a   d   2
a   e   3
b   c   4
b   d   5
b   e   6
       -》希望结果
+-------+----+----+----+--+
| col1 | c | d | e |
+-------+----+----+----+--+
| a | 1 | 2 | 3 |
| b | 4 | 5 | 6 |
+-------+----+----+----+--+
       -》创建表
create table row2col1(
col1 string,
col2 string,
col3 int
)row format delimited fields terminated by '\t';
       -》加载数据
       load data local inpath '/export/datas/r2c1.txt' into table row2col1;
       -》实现
select
col1 as col1,
max(case col2 when 'c' then col3 else 0 end) as c,
max(case col2 when 'd' then col3 else 0 end) as d,
max(case col2 when 'e' then col3 else 0 end) as e
from row2col1
group by col1;

   案例2：多行转单列【重要】
       -》创建数据：vim /export/datas/r2c2.txt
a   b   1
a   b   2
a   b   3
c   d   4
c   d   5
c   d   6
       -》希望结果
+-------+-------+--------+--+
| col1 | col2 | col3 |
+-------+-------+--------+--+
| a | b | 1,2,3 |
| c | d | 4,5,6 |
+-------+-------+--------+--+
       -》创建表
create table row2col2(
col1 string,
col2 string,
col3 int
)row format delimited fields terminated by '\t';
       -》加载数据
       load data local inpath '/export/datas/r2c2.txt' into table row2col2;
       -》实现
select
col1,
col2,
concat_ws(',', collect_set(cast(col3 as string))) as col3
from
row2col2
group by
col1, col2;

concat_ws(参数1，参数2)：用于进行字符的拼接
   参数1—指定分隔符
   参数2—拼接的内容
collect_set(col)：将某字段的值进行去重汇总，产生array类型字段
如果不想去重可用collect_list()


列转行：
   案例1：多列转多行
       -》创建数据：vim /export/datas/c2r1.txt
co11   col2   col3   col4
a       1       2       3
b       4       5       6
       -》希望结果
+-----------+-----------+-----------+--+
| _u1.col1 | _u1.col2 | _u1.col3 |
+-----------+-----------+-----------+--+
| a | c | 1 |
| a | d | 2 |
| a | e | 3 |
| b | c | 4 |
| b | d | 5 |
| b | e | 6 |
+-----------+-----------+-----------+--+

       -》创建表
create table col2row1(
col1 string,
col2 int,
col3 int,
col4 int
)row format delimited fields terminated by '\t';
       -》加载数据
       load data local inpath '/export/datas/c2r1.txt' into table col2row1;
       -》实现
select col1, 'c' as col2, col2 as col3 from col2row1
UNION ALL
select col1, 'd' as col2, col3 as col3 from col2row1
UNION ALL
select col1, 'e' as col2, col4 as col3 from col2row1;

   案例2：单列转多行（重要）
       -》创建数据：vim /export/datas/c2r2.txt
col1 col2 col3
a   b   1,2,3
c   d   4,5,6
       -》希望结果
+-------+-------+-------+--+
| col1 | col2 | col3 |
+-------+-------+-------+--+
| a | b | 1 |
| a | b | 2 |
| a | b | 3 |
| c | d | 4 |
| c | d | 5 |
| c | d | 6 |
+-------+-------+-------+--+
       -》创建表
create table col2row2(
col1 string,
col2 string,
col3 string
)row format delimited fields terminated by '\t';
       -》加载数据
       load data local inpath '/export/datas/c2r2.txt' into table col2row2;
       -》实现
select
col1,
col2,
lv.col3 as col3
from
col2row2
lateral view
explode(split(col3, ',')) lv as col3;