前置文档:
- Flink 源码剖析|Watermark 与各内置 Watermark 生成器(WatermarkGenerator)
- Flink 源码剖析|watermark 时间戳指定器(TimestampAssigner)
在 Flink 的分布式执行的过程中,WatermarkGenerator
和 TimestampAssigner
均有可能需要在不同节点之间进行传输。但是,这两个接口均没有直接继承 Serializable
以实现可序列化,而是通过可序列化的提供者类(Supplier
)的方式实现。通过提供者模式(supplier pattern),可以避免需要在 API 方法中考虑序列化的问题。
WatermarkGeneratorSupplier
首先,我们来看 WatermarkGenerator
的提供者类 WatermarkGeneratorSupplier
。
源码:flink-core/src/main/java/org/apache/flink/api/common/eventtime/WatermarkGeneratorSupplier.java
【Github】
package org.apache.flink.api.common.eventtime;
import org.apache.flink.annotation.PublicEvolving;
import org.apache.flink.metrics.MetricGroup;
import java.io.Serializable;
@PublicEvolving
@FunctionalInterface
public interface WatermarkGeneratorSupplier<T> extends Serializable {
/** Instantiates a {@link WatermarkGenerator}. */
WatermarkGenerator<T> createWatermarkGenerator(Context context);
/**
* Additional information available to {@link #createWatermarkGenerator(Context)}. This can be
* access to {@link org.apache.flink.metrics.MetricGroup MetricGroups}, for example.
*/
interface Context {
/**
* Returns the metric group for the context in which the created {@link WatermarkGenerator}
* is used.
*
* <p>Instances of this class can be used to register new metrics with Flink and to create a
* nested hierarchy based on the group names. See {@link MetricGroup} for more information
* for the metrics system.
*
* @see MetricGroup
*/
MetricGroup getMetricGroup();
}
}
- 泛型
T
为构造的WatermarkGenerator
处理的输入数据流类型 - 通过继承了
Serializable
接口实现可序列化 - 提供接口方法
createWatermarkGenerator
,根据可获取指标信息的Context
构造WatermarkGenerator
,支持创建指标
TimestampAssignerSupplier
接着,我们来看 TimestampAssigner
的提供者类 TimestampAssignerSupplier
。
源码:flink-core/src/main/java/org/apache/flink/api/common/eventtime/TimestampAssignerSupplier.java
【Github】
package org.apache.flink.api.common.eventtime;
import org.apache.flink.annotation.PublicEvolving;
import org.apache.flink.api.java.ClosureCleaner;
import org.apache.flink.metrics.MetricGroup;
import java.io.Serializable;
/**
* A supplier for {@link TimestampAssigner TimestampAssigners}. The supplier pattern is used to
* avoid having to make {@link TimestampAssigner} {@link Serializable} for use in API methods.
*
* <p>This interface is {@link Serializable} because the supplier may be shipped to workers during
* distributed execution.
*/
@PublicEvolving
@FunctionalInterface
public interface TimestampAssignerSupplier<T> extends Serializable {
/** Instantiates a {@link TimestampAssigner}. */
TimestampAssigner<T> createTimestampAssigner(Context context);
static <T> TimestampAssignerSupplier<T> of(SerializableTimestampAssigner<T> assigner) {
return new SupplierFromSerializableTimestampAssigner<>(assigner);
}
/**
* Additional information available to {@link #createTimestampAssigner(Context)}. This can be
* access to {@link org.apache.flink.metrics.MetricGroup MetricGroups}, for example.
*/
interface Context {
/**
* Returns the metric group for the context in which the created {@link TimestampAssigner}
* is used.
*
* <p>Instances of this class can be used to register new metrics with Flink and to create a
* nested hierarchy based on the group names. See {@link MetricGroup} for more information
* for the metrics system.
*
* @see MetricGroup
*/
MetricGroup getMetricGroup();
}
/**
* We need an actual class. Implementing this as a lambda in {@link
* #of(SerializableTimestampAssigner)} would not allow the {@link ClosureCleaner} to "reach"
* into the {@link SerializableTimestampAssigner}.
*/
class SupplierFromSerializableTimestampAssigner<T> implements TimestampAssignerSupplier<T> {
private static final long serialVersionUID = 1L;
private final SerializableTimestampAssigner<T> assigner;
public SupplierFromSerializableTimestampAssigner(
SerializableTimestampAssigner<T> assigner) {
this.assigner = assigner;
}
@Override
public TimestampAssigner<T> createTimestampAssigner(Context context) {
return assigner;
}
}
}
- 泛型
T
为构造的TimestampAssigner
处理的输入数据流类型 createTimestampAssigner
和Context
的逻辑与WatermarkGeneratorSupplier
类似- 为已经支持序列化的
SerializableTimestampAssigner
对象提供了of
方法:of
方法时传入可序列化的SerializableTimestampAssigner
对象,返回一个TimestampAssignerSupplier
的子类SupplierFromSerializableTimestampAssigner
,该子类的createTimestampAssigner
方法在被调用时,会直接返回of
方法参数的实例化对象