设置分区的三种方法coalesce、repartition、partitionBy

本文介绍了Spark中用于改变RDD分区的三种方法:coalesce减少分区数,repartition(等价于coalesce时触发shuffle)调整分区数,以及partitionBy使用自定义分区器进行分区。内容包括每种方法的使用场景和具体实现。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

coalesce[ˌkəʊəˈles]:改变 RDD 的分区数

/*
* false:不产生 shuffle
* true:产生 shuffle
* 如果重分区的数量大于原来的分区数量,必须设置为 true,否则分区数不变
* 增加分区会把原来的分区中的数据随机分配给设置的分区个数
*/
val coalesceRdd = result.coalesce(6,true)
val results = coalesceRdd.mapPartitionsWithIndex((index,x) => {
val list = ListBuffer[String]()
while (x.hasNext) {
list += "partition:"+ index + " content:[" + x.next + "]"
}
list.iterator
})
println("分区数量:" + results.partitions.size)
val resultArr = results.collect()
for(x <- resultArr){
println(x)
}
结果:
分区数量:6
partition:0 content:[partition:1 content:Tom07]
partition:0 content:[partition:2 content:Tom10]
partition:1 content:[partition:0 content:Tom01]
partition:1 content:[partition:1 content:Tom08]
partition:1 content:[partition:2 content:Tom11]
partition:2 content:[partition:0 content:Tom02]
partition:2 content:[partition:2 content:Tom12]
partition:3 content:[partition:0 content:Tom03]
partition:4 content:[partition:0 content:Tom04]
partition:4 content:[partition:1 content:Tom05]
partition:5 content:[partition:1 content:Tom06]
partition:5 content:[partition:2 content:Tom09]
val coalesceRdd = result.coalesce(6,fasle)的结果是:
分区数量:3
partition:0 content:[partition:0 content:Tom01]
partition:0 content:[partition:0 content:Tom02]
partition:0 content:[partition:0 content:Tom03]
partition:0 content:[partition:0 content:Tom04]
partition:1 content:[partition:1 content:Tom05]
partition:1 content:[partition:1 content:Tom06]
partition:1 content:[partition:1 content:Tom07]
partition:1 content:[partition:1 content:Tom08]
partition:2 content:[partition:2 content:Tom09]
partition:2 content:[partition:2 content:Tom10]
partition:2 content:[partition:2 content:Tom11]
partition:2 content:[partition:2 content:Tom12]
val coalesceRdd = result.coalesce(2,fasle)的结果是:
分区数量:2
partition:0 content:[partition:0 content:Tom01]
partition:0 content:[partition:0 content:Tom02]
partition:0 content:[partition:0 content:Tom03]
partition:0 content:[partition:0 content:Tom04]
partition:1 content:[partition:1 content:Tom05]
partition:1 content:[partition:1 content:Tom06]
partition:1 content:[partition:1 content:Tom07]
partition:1 content:[partition:1 content:Tom08]
partition:1 content:[partition:2 content:Tom09]
partition:1 content:[partition:2 content:Tom10]
partition:1 content:[partition:2 content:Tom11]
partition:1 content:[partition:2 content:Tom12]
val coalesceRdd = result.coalesce(2,true)的结果是:
分区数量:2
partition:0 content:[partition:0 content:Tom01]
partition:0 content:[partition:0 content:Tom03]
partition:0 content:[partition:1 content:Tom05]
partition:0 content:[partition:1 content:Tom07]
partition:0 content:[partition:2 content:Tom09]
partition:0 content:[partition:2 content:Tom11]
partition:1 content:[partition:0 content:Tom02]
partition:1 content:[partition:0 content:Tom04]
partition:1 content:[partition:1 content:Tom06]
partition:1 content:[partition:1 content:Tom08]
partition:1 content:[partition:2 content:Tom10]
partition:1 content:[partition:2 content:Tom12]

详细图示:


repartition:改变 RDD 分区数
repartition(int n) = coalesce(int n, true)


partitionBy:通过自定义分区器改变 RDD 分区数

JavaPairRDD<Integer, String> partitionByRDD = nameRDD.partitionBy(new
Partitioner() {
private static final long  serialVersionUID = 1L;
//分区数 2
@Override
public int numPartitions() {
return 2;
}
//分区逻辑
@Override
public int getPartition(Object obj) {
int i = (int)obj;
if(i % 2 == 0){
return 0;
}else{
return 1;
}
}
});

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值