In this talk, we’ll explore the types of production failures and the unique impact that each has on
your pipeline’s SLAs. We’ll then see how FLIP-304 and the new Pluggable Failure Handling
Interface enables users to implement custom failure handlers using Flink’s generic plugin
framework. Throughout, we’ll introduce use cases like: classifying failures (e.g., User or
System), emitting custom metrics (e.g., application or platform), exposing to downstream
consumers (e.g., notification systems), and implementing custom failover/restart strategies.
Finally, as part of the live demo, users will learn how to implement simple failure Classifiers and
expose their metadata as part of Flink's web interface.
Related topics: