目录
一、数据炸裂
0 问题描述
如何将字符串1-5,16,11-13,9" 扩展成 "1,2,3,4,5,16,11,12,13,9" 且顺序不变。
1 数据准备
with data as (select '1-5,16,11-13,9' as a)
2 数据分析
步骤一:explode(split(a, ',')) 炸裂 + row_number()排序,一行变多行,且对每行的数据排序,保证有序性。
with data as (select '1-5,16,11-13,9' as a)
select
a1,
row_number() over () as rn
from (
select
explode(split(a, ',')) as a1
from data
) tmp1;
输出结果:
步骤二: lateral view explode(split(a, '-')) 、max(b) - min(b) as diff
(1)lateral view +explode 侧写和炸裂,一行变多行,并将源表中每行的输出结果与该行连接;
(2)group by a1, rn ....... select min(a2) as start_data得到每个分组的起始值
(3)max(a2) - min(a1) 得到每个分组的步长
with data as (select '1-5,16,11-13,9' as a)
select
a1,
rn,
cast(min(a2) as int) as start_data,
cast(max(a2) - min(a2) as int) as diff
from (
select
a1,
a2,
rn
from (
select
a1,
row_number() over () as rn
from (
select
explode(split(a, ',')) as a1
from data
) tmp1
) tmp2
lateral view explode(split(a1, '-')) table1 as a2
) tmp3
group by a1, rn;
输出结果是:
步骤三: 根据步长生成索引值,起始值加上索引值获取展开值
侧写和炸裂,根据分组的步长 diff 生成对应的索引值pos
(1)lateral view posexplode(split(repeat(',', diff), ',')) table2 as pos, item;
该代码等价于: lateral view posexplode(split(space(diff), '')) table2 as pos, item;
(2)(start_data + pos) as end_data,起始值加上索引值获取展开值
with data as (select '1-5,16,11-13,9' as a)
select
a1,
rn,
start_data,
diff,
(start_data + pos) as end_data
from (
select
a1,
rn,
cast(min(a2) as int) as start_data,
cast(max(a2) - min(a2)