Flink / Scala 实战 - 13.TimeWindow 处理迟到数据详解

本文深入探讨了Flink在EventTime语境下处理迟到数据的问题,包括TimeWindow丢数据示例,以及处理迟到数据的三种策略:forBoundedOutOfOrderness、Allowed Lateness和SideOutputLateData。通过实例分析了这些策略如何确保数据的准确性,并提供了实战代码以展示如何应用这些策略来处理迟到数据。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

目录

一.引言

二.Flink TimeWindow 丢数据示例

1.代码分析

2.Watermark 生成逻辑

3.丢失数据代码测试

三.Flink 处理迟到数据策略

1.forBoundedOutOfOrderness

2.Allowed Lateness

3.SideOutputLateData

四.Flink 处理迟到数据实战

1.代码分析

2.迟到数据处理实战

3.日志分析

五.总结


一.引言

在事件时间 EventTime 语义环境下,窗口中可能出现数据迟到的情况,即 Event 时间戳小于水位线 Watermark,此时数据流未乱序流,水位线不能保证小于自己的时间戳的 Event 不会到来。这时会出现一种情况,水位线到达窗口触发时间触发窗口计算,而属于该窗口的迟到数据到来,默认情况下该类数据会被丢弃。数据的丢弃会导致窗口的计算结果不准确,因此 Flink 推出了 3 种方法处理迟到数据:

- WatermarkStrategy.forBoundedOutOfOrderness 最大延时

- Allowed Lateness 窗口延迟注销

- SideOutputLateData 侧输出流处理迟到数据

二.Flink TimeWindow 丢数据示例

1.代码分析

在介绍几种处理迟到数据方法之前,先示例一下 Flink 在正常情况下如何丢失数据导致计算不准确。以下 Demo 使用 Event 类作为 DataStream[T] 的数据类型,Event 为用户浏览 URL 的简单信息:

networks:  net:    external: trueservices:  jobmanager1:    restart: always    image: apache/flink:1.16.3    container_name: jobmanager1    hostname: jobmanager1    ports:     - '8081:8081'    volumes:      - /etc/localtime:/etc/localtime      - /home/sumengnan/apache/flink/timezone:/etc/timezone      - /home/sumengnan/apache/flink/conf/flink-conf-jobmanager1.yaml:/opt/flink/conf/flink-conf.yaml      - /home/sumengnan/apache/flink/conf/log4j.properties:/opt/flink/conf/log4j.properties      - /home/sumengnan/apache/flink/conf/logback.xml:/opt/flink/conf/logback.xml      - /home/sumengnan/apache/flink/conf/log4j-console.properties:/opt/flink/conf/log4j-console.properties      - /home/sumengnan/apache/flink/conf/logback-console.xml:/opt/flink/conf/logback-console.xml      - /home/sumengnan/apache/flink/data:/opt/flink/data      - /home/sumengnan/apache/flink/log:/opt/flink/log      - /home/sumengnan/apache/flink/tmp:/opt/flink/tmp    networks:      - net  jobmanager2:    restart: always    image: apache/flink:1.16.3    container_name: jobmanager2    hostname: jobmanager2    ports:     - '8082:8081'    volumes:      - /etc/localtime:/etc/localtime      - /home/sumengnan/apache/flink/timezone:/etc/timezone      - /home/sumengnan/apache/flink/conf/flink-conf-jobmanager2.yaml:/opt/flink/conf/flink-conf.yaml      - /home/sumengnan/apache/flink/conf/log4j.properties:/opt/flink/conf/log4j.properties      - /home/sumengnan/apache/flink/conf/logback.xml:/opt/flink/conf/logback.xml      - /home/sumengnan/apache/flink/conf/log4j-console.properties:/opt/flink/conf/log4j-console.properties      - /home/sumengnan/apache/flink/conf/logback-console.xml:/opt/flink/conf/logback-console.xml      - /home/sumengnan/apache/flink/data:/opt/flink/data      - /home/sumengnan/apache/flink/log:/opt/flink/log      - /home/sumengnan/apache/flink/tmp:/opt/flink/tmp    networks:      - net    depenes_on:      - jobmanager1  taskmanager1:    restart: always    image: apache/flink:1.16.3    container_name: taskmanager1    hostname: taskmanager1    command: taskmanager    volumes:      - /etc/localtime:/etc/localtime      - /home/sumengnan/apache/flink/timezone:/etc/timezone      - /home/sumengnan/apache/flink/conf/flink-conf-taskmanager1.yaml:/opt/flink/conf/flink-conf.yaml      - /home/sumengnan/apache/flink/conf/log4j.properties:/opt/flink/conf/log4j.properties      - /home/sumengnan/apache/flink/conf/logback.xml:/opt/flink/conf/logback.xml      - /home/sumengnan/apache/flink/conf/log4j-console.properties:/opt/flink/conf/log4j-console.properties      - /home/sumengnan/apache/flink/conf/logback-console.xml:/opt/flink/conf/logback-console.xml      - /home/sumengnan/apache/flink/data:/opt/flink/data      - /home/sumengnan/apache/flink/log:/opt/flink/log      - /home/sumengnan/apache/flink/tmp:/opt/flink/tmp    networks:      - net    depenes_on:      - jobmanager1      - jobmanager2  taskmanager2:    restart: always    image: apache/flink:1.16.3    container_name: taskmanager2    hostname: taskmanager2    command: taskmanager    volumes:      - /etc/localtime:/etc/localtime      - /home/sumengnan/apache/flink/timezone:/etc/timezone      - /home/sumengnan/apache/flink/conf/flink-conf-taskmanager2.yaml:/opt/flink/conf/flink-conf.yaml      - /home/sumengnan/apache/flink/conf/log4j.properties:/opt/flink/conf/log4j.properties      - /home/sumengnan/apache/flink/conf/logback.xml:/opt/flink/conf/logback.xml      - /home/sumengnan/apache/flink/conf/log4j-console.properties:/opt/flink/conf/log4j-console.properties      - /home/sumengnan/apache/flink/conf/logback-console.xml:/opt/flink/conf/logback-console.xml      - /home/sumengnan/apache/flink/data:/opt/flink/data      - /home/sumengnan/apache/flink/log:/opt/flink/log      - /home/sumengnan/apache/flink/tmp:/opt/flink/tmp    networks:      - net    depenes_on:      - jobmanager1      - jobmanager2  taskmanager3:    restart: always    image: apache/flink:1.16.3    container_name: taskmanager3    hostname: taskmanager3    command: taskmanager    volumes:      - /etc/localtime:/etc/localtime      - /home/sumengnan/apache/flink/timezone:/etc/timezone      - /home/sumengnan/apache/flink/conf/flink-conf-taskmanager3.yaml:/opt/flink/conf/flink-conf.yaml      - /home/sumengnan/apache/flink/conf/log4j.properties:/opt/flink/conf/log4j.properties      - /home/sumengnan/apache/flink/conf/logback.xml:/opt/flink/conf/logback.xml      - /home/sumengnan/apache/flink/conf/log4j-console.properties:/opt/flink/conf/log4j-console.properties      - /home/sumengnan/apache/flink/conf/logback-console.xml:/opt/flink/conf/logback-console.xml      - /home/sumengnan/apache/flink/data:/opt/flink/data      - /home/sumengnan/apache/flink/log:/opt/flink/log      - /home/sumengnan/apache/flink/tmp:/opt/flink/tmp    networks:      - net    depenes_on:      - jobmanager1      - jobmanager2
最新发布
04-04
评论 19
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

BIT_666

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值