flink checkpoint 流程源码分析

本文深入探讨Flink的checkpoint流程,从CheckpointConfig配置到ExecutionGraph生成,详细解析了CheckpointCoordinator的工作机制,包括如何触发和确认checkpoint,以及TaskManager在其中的角色。文章以Flink 1.10为基础,揭示了其容错机制的核心——分布式快照的实现原理。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

flink 容错机制的核心是对数据流做连续的分布式快照(snapshots)。在系统失败时,各个算子可以从这些快照构成的检查点(checkpoint)恢复到故障之前的状态,保证即使遇到故障,作业的最终结果只被数据流中的每一条消息影响一次(exactly-once) (这里可以通过配置退化成 at least once)。生成分布式快照的机制在 “Lightweight Asynchronous Snapshots for Distributed Dataflows” 这篇文章中有详细描述。它的灵感来自于 Chandy-Lamport algorithm, 并且在 flink 的实现模型中做了调整。

快照状态的保存机制我们在 Flink 如何保存状态数据 这篇文章中介绍过。本文介绍 flink 是如何进行分布式快照的。

本文代码基于 flink-1.10。

1.CheckpointConfig

flink-1.9 及之前版本只能使用 api 来进行 checkpoint 配置。flink-1.10 可以在 conf/flink-conf.yaml 或通过 -yD/-D 方式配置。
CheckpointConfig 增加了 configure(ReadableConfig configuration) 方法,并在 StreamExecutionEnvironment 初始化时调用。

// StreamExecutionEnvironment.java
public void configure(ReadableConfig configuration, ClassLoader classLoader) {
   
		...
		checkpointCfg.configure(configuration);
	}
// CheckpointConfig.java
	public void configure(ReadableConfig configuration) {
   
		configuration.getOptional(ExecutionCheckpointingOptions.CHECKPOINTING_MODE)
			.ifPresent(this::setCheckpointingMode);
		configuration.getOptional(ExecutionCheckpointingOptions.CHECKPOINTING_INTERVAL)
			.ifPresent(i -> this.setCheckpointInterval(i.toMillis()));
		configuration.getOptional(ExecutionCheckpointingOptions.CHECKPOINTING_TIMEOUT)
			.ifPresent(t -> this.setCheckpointTimeout(t.toMillis()));
		configuration.getOptional(ExecutionCheckpointingOptions.MAX_CONCURRENT_CHECKPOINTS)
			.ifPresent(this::setMaxConcurrentCheckpoints);
		configuration.getOptional(ExecutionCheckpointingOptions.MIN_PAUSE_BETWEEN_CHECKPOINTS)
			.ifPresent(m -> this.setMinPauseBetweenCheckpoints(m.toMillis()));
		configuration.getOptional(ExecutionCheckpointingOptions.PREFER_CHECKPOINT_FOR_RECOVERY)
			.ifPresent(this::setPreferCheckpointForRecovery);
		configuration.getOptional(ExecutionCheckpointingOptions.TOLERABLE_FAILURE_NUMBER)
			.ifPresent(this::setTolerableCheckpointFailureNumber);
		configuration.getOptional(ExecutionCheckpointingOptions.EXTERNALIZED_CHECKPOINT)
			.ifPresent(this::enableExternalizedCheckpoints);
	}

2.生成 StreamGraph

StreamGraphGenerator 生成 StreamGraph 时,CheckpointConfig 直接传递给 StreamGraph

// StreamGraphGenerator.java
public StreamGraph generate() {
   
		streamGraph = new StreamGraph(executionConfig, checkpointConfig, savepointRestoreSettings);
		...

		return builtStreamGraph;
	}

3.生成 JobGraph

StreamGraph 转换成 JobGraph 时,定义了三种顶点:

  • triggerVertices: 需要 “触发 checkpoint” 的顶点,后续 CheckpointCoordinator 发起 checkpoint 时,只有这些点会收到 trigger checkpoint 消息。只有 source 顶点会成为 triggerVertices.
  • ackVertices: 需要在 snapshot 完成后,向 CheckpointCoordinator 发送 ack 消息的顶点。所有顶点都是 ackVertices.
  • commitVertices: 需要在 checkpoint 完成后,收到 CheckpointCoordinator “notifyCheckpointComplete” 消息的顶点。所有顶点都是 commitVertices.

其次,还生成 CheckpointCoordinatorConfiguration,CheckpointCoordinator 初始化时会用到。

// StreamingJobGraphGenerator.java
private void configureCheckpointing() {
   
		CheckpointConfig cfg = streamGraph.getCheckpointConfig();

		long interval = cfg.getCheckpointInterval();
		if (interval < MINIMAL_CHECKPOINT_TIME) {
   
			// interval of max value means disable periodic checkpoint
			interval = Long.MAX_VALUE;
		}

		//  --- configure the participating vertices ---

		// collect the vertices that receive "trigger checkpoint" messages.
		// currently, these are all the sources
		List<JobVertexID> triggerVertices = new ArrayList<>();

		// collect the vertices that need to acknowledge the checkpoint
		// currently, these are all vertices
		List<JobVertexID> ackVertices = new ArrayList<>(jobVertices.size());

		// collect the vertices that receive "commit checkpoint" messages
		// currently, these are all vertices
		List<JobVertexID> commitVertices = new ArrayList<>(jobVertices.size());

		for (JobVertex vertex : jobVertices.values()) {
   
			if (vertex.isInputVertex()) {
   
				triggerVertices.add(vertex.getID());
			}
			commitVertices.add(vertex.getID());
			ackVertices.add(vertex.getID());
		}

		//  --- configure options ---

		CheckpointRetentionPolicy retentionAfterTermination;
		if (cfg.isExternalizedCheckpointsEnabled()) {
   
			CheckpointConfig.ExternalizedCheckpointCleanup cleanup = cfg.getExternalizedCheckpointCleanup();
			// Sanity check
			if (cleanup == null) {
   
				throw new IllegalStateException("Externalized checkpoints enabled, but no cleanup mode configured.");
			}
			retentionAfterTermination = cleanup.deleteOnCancellation() ?
					CheckpointRetentionPolicy.RETAIN_ON_FAILURE :
					CheckpointRetentionPolicy.RETAIN_ON_CANCELLATION;
		} else {
   
			retentionAfterTermination = CheckpointRetentionPolicy.NEVER_RETAIN_AFTER_TERMINATION;
		}

		CheckpointingMode mode = cfg.getCheckpointingMode();

		boolean isExactlyOnce;
		if (mode == CheckpointingMode.EXACTLY_ONCE) {
   
			isExactlyOnce = cfg.isCheckpointingEnabled();
		} else if (mode == CheckpointingMode.AT_LEAST_ONCE) {
   
			isExactlyOnce = false;
		} else {
   
			throw new IllegalStateException("Unexpected checkpointing mode. " +
				"Did not expect there to be another checkpointing mode besides " +
				"exactly-once or at-least-once.");
		}

		//  --- configure the master-side checkpoint hooks ---

		final ArrayList<MasterTriggerRestoreHook.Factory> hooks = new ArrayList<>();

		for (StreamNode node : streamGraph.getStreamNodes()) {
   
			if (node.getOperatorFactory() instanceof UdfStreamOperatorFactory) {
   
				Function f = ((UdfStreamOperatorFactory) node.getOperatorFactory()).getUserFunction();

				if (f instanceof WithMasterCheckpointHook) {
   
					hooks.add(new FunctionMasterCheckpointHookFactory((WithMasterCheckpointHook<?>) f));
				}
			}
		}

		// because the hooks can have user-defined code, they need to be stored as
		// eagerly serialized values
		final SerializedValue<MasterTriggerRestoreHook.Factory[]> serializedHooks;
		if (hooks.isEmpty()) {
   
			serializedHooks = null;
		} else {
   
			try {
   
				MasterTriggerRestoreHook.Factory[] asArray =
						hooks.toArray(new MasterTriggerRestoreHook.Factory[hooks.size()]);
				serializedHooks = new SerializedValue<>(asArray);
			}
			catch (IOException e) {
   
				throw new FlinkRuntimeException("Trigger/restore hook is not serializable", e);
			}
		}

		// because the state backend can have user-defined code, it needs to be stored as
		// eagerly serialized value
		final SerializedValue<StateBackend> serializedStateBackend;
		if (streamGraph.getStateBackend() == null) {
   
			serializedStateBackend = null;
		} else {
   
			try {
   
				serializedStateBackend =
					new SerializedValue<StateBackend>(streamGraph.getStateBackend());
			}
			catch (IOException e) {
   
				throw new FlinkRuntimeException("State backend is not serializable", e);
			}
		}

		//  --- done, put it all together ---

		JobCheckpointingSettings settings = new JobCheckpointingSettings(
			triggerVertices,
			ackVertices,
			commitVertices,
			new CheckpointCoordinatorConfiguration(
				interval,
				cfg.getCheckpointTimeout(),
				cfg.getMinPauseBetweenCheckpoints(),
				cfg.getMaxConcurrentCheckpoints(),
				retentionAfterTermination,
				isExactlyOnce,
				cfg.isPreferCheckpointForRecovery(),
				cfg.getTolerableCheckpointFailureNumber()),
			serializedStateBackend,
			serializedHooks);

		jobGraph.setSnapshotSettings(settings);
	}

4.生成 ExecutionGraph

// ExecutionGraph.java
public void enableCheckpointing(
			CheckpointCoordinatorConfiguration chkConfig,
			List<ExecutionJobVertex> verticesToTrigger,
			List<ExecutionJobVertex> verticesToWaitFor,
			List<ExecutionJobVertex> verticesToCommitTo,
			List<MasterTriggerRestoreHook<?>> masterHooks,
			CheckpointIDCounter checkpointIDCounter,
			CompletedCheckpointStore checkpointStore,
			StateBackend checkpointStateBackend,
			CheckpointStatsTracker statsTracker) {
   

		checkState(state == JobStatus.CREATED, "Job must be in CREATED state");
		checkState(checkpointCoordinator == null, "checkpointing already enabled");

		ExecutionVertex[] tasksToTrigger = collectExecutionVertices(verticesToTrigger);
		ExecutionVertex[] tasksToWaitFor = collectExecutionVertices(verticesToWaitFor);
		ExecutionVertex[] tasksToCommitTo = collectExecutionVertices(verticesToCommitTo);

		checkpointStatsTracker = checkNotNull(statsTracker, "CheckpointStatsTracker");

		CheckpointFailureManager failureManager = new CheckpointFailureManager(
			chkConfig.getTolerableCheckpointFailureNumber(),
			new CheckpointFailureManager.FailJobCallback() {
   
				@Override
				public void failJob(Throwable cause) {
   
					getJ
<think>好的,我需要帮助用户查找Flink触发Checkpoint机制的源码解析。首先,用户提供的引用内容中,引用[1]提到了Checkpoint机制的基本流程,包括barrier的生成和快照的创建。用户现在想深入源码层面了解触发Checkpoint的具体实现。 我需要回忆Flink的源码结构,特别是Checkpoint相关的部分。通常,Checkpoint的触发可能由JobManager发起,涉及到Checkpoint Coordinator组件。根据引用[1],Checkpoint barrier是通过数据流传递的,所以可能和SourceTask或StreamTask有关。 接下来,我应该考虑用户可能需要的关键类或方法。例如,CheckpointCoordinator类负责协调检查点的触发,可能包含scheduleTrigger方法。在StreamTask中,处理barrier的逻辑可能在triggerCheckpointOnBarrier方法里,涉及到状态快照的创建。 还要注意用户提供的引用[5]提到TaskManager和slot的资源分配,但触发Checkpoint更可能与JobManager相关。引用[3]提到JobManager作为Master角色,可能负责触发Checkpoint的调度。 需要确保回答中提到的源码路径正确,比如JobManager中的CheckpointCoordinator,以及StreamTask处理barrier的部分。同时,要说明触发方式,如定时触发或外部触发,并引用用户提供的资料,如引用[1]中关于barrier生成和快照的部分。 最后,生成相关问题时要围绕Checkpoint机制,如调度原理、barrier传播、失败恢复和性能影响,帮助用户进一步探索。</think>关于Flink触发Checkpoint机制的源码解析,可结合以下核心代码模块分析: ### 1. CheckpointCoordinator类 JobManager中的`CheckpointCoordinator`类是触发Checkpoint的核心控制器。其`triggerCheckpoint()`方法会通过RPC调用向Source Task发起Checkpoint请求[^1]。 ```java // 路径: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java public class CheckpointCoordinator { // 定时触发入口 private void scheduleTrigger() { ScheduledFuture<?> scheduledFuture = timer.scheduleAtFixedRate( () -> triggerCheckpoint(true), baseInterval, baseInterval, TimeUnit.MILLISECONDS); } // 核心触发逻辑 public CompletableFuture<CompletedCheckpoint> triggerCheckpoint(boolean isPeriodic) { // 生成CheckpointID和时间戳 long checkpointId = checkpointIdCounter.getAndIncrement(); long timestamp = System.currentTimeMillis(); // 向所有Source Task发送Trigger消息 for (ExecutionVertex vertex : tasksToTrigger) { ExecutionAttemptID attemptId = vertex.getCurrentExecutionAttempt().getAttemptId(); taskManagerGateway.triggerCheckpoint( attemptId, checkpointId, timestamp, CheckpointOptions.forCheckpoint()); } } } ``` ### 2. StreamTask处理层 在TaskManager侧,`StreamTask`通过`CheckpointBarrierHandler`处理barrier事件。接收到Trigger指令后,会在`triggerCheckpointOnBarrier()`方法中创建状态快照[^1][^5]: ```java // 路径: flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java protected void processInput(MailboxDefaultAction.Controller controller) throws Exception { while (true) { BufferOrEvent bufferOrEvent = inputGate.pollNext(); if (bufferOrEvent.isEvent() && bufferOrEvent.getEvent() instanceof CheckpointBarrier) { CheckpointBarrier barrier = (CheckpointBarrier) bufferOrEvent.getEvent(); // 触发本地Checkpoint operatorChain.broadcastCheckpointBarrier( barrier.getId(), barrier.getTimestamp(), barrier.getCheckpointOptions()); } } } // 状态快照创建入口 public boolean triggerCheckpointOnBarrier( CheckpointMetaData metadata, CheckpointOptions options) throws IOException { // 创建异步快照 AsyncCheckpointRunnable checkpointRunnable = new AsyncCheckpointRunnable( this, metadata, options, operatorChain); checkpointRunnable.run(); } ``` ### 3. 触发方式 - **定时触发**:通过`CheckpointConfig`配置间隔时间,由`ScheduledTrigger`周期性调用 - **外部触发**:通过REST API调用`/jobs/:jobid/checkpoints`接口 - **Barrier传播触发**:Source节点生成barrier后,下游算子依次触发
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值