一、持续更新的无限表
在spark structured streaming中,通过维持聚合数据的状态,并比较水位线watermark删除不再更新的聚合数据,实现unbounded table(持续更新的无限表)
引用一个spark官方文档示例
val windowedCounts = words.groupBy(
window($"timestamp", "10 minutes", "5 minutes"),
$"word"
).count()
下面按照聚合策略stategies的创建、stateStore聚合状态的存储、对statestore数据的更新(删除)及过期数据的写出三个层次,解析spark流计算对应的Aggregation策略。
二、创建StatefulAggregationStrategy流计算聚合策略
1、先简单介绍下spark sql对应的QueryExecution
spark sql从解析到最后生成物理执行计划的过程,在QueryExecution中可以看到完整的定义,主要流程及代码:
class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {
// 其内部定义了strategies,即逻辑执行计划到物理执行计划的绑定和转化关系
protected def planner = sparkSession.sessionState.planner
/**将antlr通过词法、语法分析生成的未解析logical plan转换为Resolved logical plan,其
*中一个重要的batchs策略即ResolveRelations:查找catalog,替换为from关键字对应的表
lazy val analyzed: LogicalPlan = {
SparkSession.setActiveSession(sparkSession)
sparkSession.sessionState.analyzer.executeAndCheck(logical)
}
/**查找table或logical plan是否使用缓存,useCachedData()内部是
*调用lookupCachedData()返回CachedData对应的InMemoryRelation,在后续生成物理
*执行计划时,可以和InMemoryTableScanExec绑定*/
lazy val withCachedData: LogicalPlan = {
assertAnalyzed()
assertSupported()
sparkSession.sharedState.cacheManager.useCachedData(analyzed)
}
/**spark预先定义的一系列优化策略,例如join谓词下推,sql语句常量相加等*/
lazy val optimizedPlan: LogicalPlan = sparkSession.sessionState.optimizer.execute(withCachedData)
/**将逻辑执行计划转换为物理执行计划,定义的strategies在SparkPlanner中
*依次对logical plan中所有节点绑定为Exec类型的SparkPlan*/
lazy val sparkPlan: SparkPlan = {
SparkSession.setActiveSession(sparkSession)
// TODO: We use next(), i.e. take the first plan returned by the planner, here for now,
// but we will implement to choose the best plan.
planner.plan(ReturnAnswer(optimizedPlan)).next()
}
// executedPlan should not be used to initialize any SparkPlan. It should be
// only used for execution.
// 插入shuffle等操作
lazy val executedPlan: SparkPlan = prepareForExecution(sparkPlan)
/** Internal version of the RDD. Avoids copies and has no schema */
lazy val toRdd: RDD[InternalRow] = executedPlan.execute()
}
2、IncrementalExecution对QueryExecution的扩展,生成流计算相关策略
IncrementalExecution是QueryExecution的子类,MicroBatchExecution中会使用此类型qe,生成流计算的物理执行计划,IncrementalExecution附加的strategy策略为:
override def extraPlanningStrategies: Seq[Strategy] =
StreamingJoinStrategy :: // 流与流join
StatefulAggregationStrategy :: // 流计算聚合
FlatMapGroupsWithStateStrategy :: // 可自定义数据状态的策略,例如更新Event事件
StreamingRelationStrategy :: // DataStreamReader生成的relation绑定,MicroBatchExecution的runBatch()中会替换,不会真正执行。对应case StreamingExecutionRelation(source, output)这段代码
StreamingDeduplicationStrategy :: Nil // drop duplicate相关实现
上述StatefulAggregationStrategy源码及注释:
object StatefulAggregationStrategy extends Strategy {
override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
case _ if !plan.isStreaming => Nil
// 用于提取watermark,EventTimeWatermarkExec内部定义了eventTimeStats,是一个累加器
case EventTimeWatermark(columnName, delay, child) =>
EventTimeWatermarkExec(columnName, delay, planLater(child)) :: Nil
//调用AggUtils.planStreamingAggregation()创建Aggregate对应的SparkPlan
case PhysicalAggregation(
namedGroupingExpressions, aggregateExpressions, rewrittenResultExpressions, child) =>
aggregate.AggUtils.planStreamingAggregation(
namedGroupingExpressions,
aggregateExpressions,
rewrittenResultExpressions,
planLater(child))
case _ => Nil
}
}
AggUtils.planStreamingAggreagation()创建聚合的物理执行计划,分以下步骤,这也是新的输入流数据被聚合的主要流程:
① 创建partialAggregate,指定的mode = Partial即对当前批次输入的数据,各hash分区按key进行数据聚合,结果类似于key–>values
② 创建partialMerged1,指定的PartialMerge,各hash分区数据合并,结果类似于key–>value
③ 创建restored: StateStoreRestoreExec(groupingAttributes, None, partialMerged1),其第二个参数stateInfo为空,即尚未定义batchID等信息,在IncrementalExecution的preparations阶段被替换进来
④ 创建partialMerged2,将restored读取的上一批次聚合状态,与当前批次输入的数据进行合并
⑤ 创建saved: StateStoreSaveExec,用于更新聚合状态到stateStore,并写出和删除过期数据。
⑥ 创建finalAndCompleteAggregate,指定mode = final,merge aggregation buffers,并输出最终结果
def planStreamingAggregation(
groupingExpressions: Seq[NamedExpression],
functionsWithoutDistinct: Seq[AggregateExpression],
resultExpressions: Seq[NamedExpression],
child: SparkPlan): Seq[SparkPlan] = {
val groupingAttributes = groupingExpressions.map(_.toAttribute)
val partialAggregate: SparkPlan = {
val aggregateExpressions = functionsWithoutDistinct.map(_.copy(mode = Partial))
val aggregateAttributes = aggregateExpressions.map(_.resultAttribute)
createAggregate(
groupingExpressions = groupingExpressions,
aggregateExpressions = aggregateExpressions,
aggregateAttributes = aggregateAttributes,
resultExpressions = groupingAttributes ++
aggregateExpressions.flatMap(_.aggregateFunction.inputAggBufferAttributes),
child = child)
}
val partialMerged1: SparkPlan = {
val aggregateExpressions = functionsWithoutDistinct.map(_.copy(mode = PartialMerge))
val aggregateAttribut