Flink源码剖析：flink-examples-streaming 自带demo示例

最新推荐文章于 2025-04-19 21:52:45 发布

Matty_Blog

最新推荐文章于 2025-04-19 21:52:45 发布

阅读量4.1k

点赞数 2

CC 4.0 BY-SA版权

分类专栏： Flink

本文链接：https://blog.csdn.net/a1240466196/article/details/105890802

本文详细介绍了Flink中的多个示例应用，包括wordcount、socket、async、iteration、join、sideoutput和不同类型的windowing（session、count），通过实例展示了Flink实时数据处理的各种操作，如数据流join、旁路输出和窗口计算等。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

文章目录

本文主要分析下 Flink 源码中 flink-examples-streaming 模块，带大家跑一下其中的例子，让大家可以更熟悉 DataStream API 的使用，以及 flink streaming 能解决的问题场景等。
通常我们看源码都是从一个源码的 examples 开始入手的，大家以后要想实现 flink streaming 相关应用，可以直接在这个模块中修改，因为各种依赖都已经配置好了。
笔者对源码中的示例会有些许改动，并把代码粘贴在了文中。

1.wordcount

实时统计单词数量，每来一个计算一次并输出一次。

public class WordCount {
   
   

	// *************************************************************************
	// PROGRAM
	// *************************************************************************

	public static void main(String[] args) throws Exception {
   
   

		final ParameterTool params = ParameterTool.fromArgs(args);
		final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
		env.getConfig().setGlobalJobParameters(params);
		DataStream<String> text;
		if (params.has("input")) {
   
   
			// read the text file from given input path
			text = env.readTextFile(params.get("input"));
		} else {
   
   
			// get default test text data
			text = env.fromElements(new String[] {
   
   
				"miao,She is a programmer",
				"wu,He is a programmer",
				"zhao,She is a programmer"
			});
		}

		DataStream<Tuple2<String, Integer>> counts =
			// split up the lines in pairs (2-tuples) containing: (word,1)
			text.flatMap(new Tokenizer())
			// group by the tuple field "0" and sum up tuple field "1"
			.keyBy(0).sum(1);

		// emit result
		if (params.has("output")) {
   
   
			counts.writeAsText(params.get("output"));
		} else {
   
   
			System.out.println("Printing result to stdout. Use --output to specify output path.");
			counts.print();
		}

		// execute program
		env.execute("Streaming WordCount");
	}

	// *************************************************************************
	// USER FUNCTIONS
	// *************************************************************************
	public static final class Tokenizer implements FlatMapFunction<String, Tuple2<String, Integer>> {
   
   

		@Override
		public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
   
   
			// normalize and split the line
			String[] tokens = value.toLowerCase().split("\\W+");

			// emit the pairs
			for (String token : tokens) {
   
   
				if (token.length() > 0) {
   
   
					out.collect(new Tuple2<>(token, 1));
				}
			}
		}
	}
}

输出结果：

8> (wu,1)
6> (a,1)
8> (is,1)
4> (programmer,1)
2> (he,1)
5> (miao,1)
4> (programmer,2)
6> (a,2)
3> (she,1)
8> (is,2)
3> (she,2)
8> (is,3)
6> (zhao,1)
4> (programmer,3)
6> (a,3)

2.socket

监听socket端口输入的单词，进行单词统计。

public class SocketWindowWordCount {
   
   

	public static void main(String[] args) throws Exception {
   
   

		// the host and the port to connect to
		final String hostname;
		final int port;
		try {
   
   
			final ParameterTool params = ParameterTool.fromArgs(args);
			hostname = params.has("hostname") ? params.get("hostname") : "localhost";
			port = 9999;
		} catch (Exception e) {
   
   
			return;
		}

		final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

		// get input data by connecting to the socket
		// 数据来源是从socket读取，元素可以用分隔符切分
		DataStream<String> text = env.socketTextStream(hostname, port, "\n");

		// parse the data, group it, window it, and aggregate the counts
		DataStream<WordWithCount> windowCounts = text

			.flatMap(new FlatMapFunction<String, WordWithCount>() {
   
   
				@Override
				public void flatMap(String value, Collector<WordWithCount> out) {
   
   
					for (String word : value.split("\\s")) {
   
   
						out.collect(new WordWithCount(word, 1L));
					}
				}
			})

			.keyBy("word")
			.timeWindow(Time.seconds(10))

			.reduce(new ReduceFunction<WordWithCount>() {
   
   
				// 统计单词个数
				// reduce返回单个的结果值，并且reduce每处理一个元素总是创建一个新值。常用的average,sum,min,max,count,使用reduce方法都可以实现
				@Override
				public WordWithCount reduce(WordWithCount a, WordWithCount b) {
   
   
					return new WordWithCount(a.word, a.count + b.count);
				}
			});

		// print the results with a single thread, rather than in parallel
		windowCounts.print().setParallelism(1);

		env.execute("Socket Window WordCount");
	}

	// ------------------------------------------------------------------------

	/**
	 * Data type for words with count.
	 */
	public static class WordWithCount {
   
   

		public String word;
		public long count;

		public WordWithCount() {
   
   
		}

		public WordWithCount(String word, long count) {
   
   
			this.word = word;
			this.count = count;
		}

		@Override
		public String toString() {
   
   
			return word + " : " + count;
		}
	}
}

本机启用监听端口：

nc -l 9999

socket监听端口输入以下内容：

miao she is a programmer
wu he is a programmer
zhao she is a programmer

输出的结果：

she : 2
programmer : 3
he : 1
a : 3
zhao : 1
miao : 1
wu : 1
is : 3

3.async

主要通过以下示例了解下 AsyncFunction 作用到 DataStream 上的使用方法。没有用测试数据去跑。

public class AsyncIOExample {
   
   

	private static final Logger LOG = LoggerFactory.getLogger(AsyncIOExample.class);

	private static final String EXACTLY_ONCE_MODE = "exactly_once";
	private static final String EVENT_TIME = "EventTime";
	private static final String INGESTION_TIME = "IngestionTime";
	private static final String ORDERED = "ordered";

	public static void main(String[] args) throws Exception {
   
   

		// obtain execution environment
		StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

		// parse parameters
		final ParameterTool params = ParameterTool.fromArgs(args);

		// 状态存放路径
		final String statePath;
		// checkpoint模式
		final String cpMode;
		// source生成的最大值
		final int maxCount;
		// RichAsyncFunction 中 线程休眠的因子
		final long sleepFactor;
		// 模拟RichAsyncFunction出错的概率因子
		final float failRatio;
		// 标志RichAsyncFunction 中的消息是有序还是无序的
		final String mode;
		// 设置任务的并行度
		final int taskNum;
		// 使用的Flink时间类型
		final String timeType;
		// 优雅停止RichAsyncFunction中线程池的等待毫秒数
		final long shutdownWaitTS;
		// RichAsyncFunction中执行异步操作的超时时间
		final long timeout;

		try {
   
   
			// check the configuration for the job
			statePath = params.get("fsStatePath", null);
			cpMode = params.get("checkpointMode", "exactly_once");
			maxCount = params.getInt("maxCount", 100000);
			sleepFactor = params.getLong("sleepFactor", 100);
			failRatio = params.getFloat("failRatio", 0.001f);
//			failRatio = params.getFloat("failRatio", 0.5f);
			mode = params.get("waitMode", "ordered");
			taskNum = params.getInt("waitOperatorParallelism", 1);
			timeType = params.get("eventType", "EventTime");
			shutdownWaitTS = params.getLong("shutdownWaitTS", 20000);
			timeout = params.getLong("timeout", 10000L);
		} catch (Exception e) {
   
   
			printUsage();

			throw e;
		}

		StringBuilder configStringBuilder = new StringBuilder();

		final String lineSeparator = System.getProperty("line.separator");

		configStringBuilder
			.append("Job configuration").append(lineSeparator)
			.append("FS state path=").append(statePath).append(lineSeparator)
			.append("Checkpoint mode=").append(cpMode).append(lineSeparator)
			.append("Max count of input from source=").append(maxCount).append(lineSeparator)
			.append("Sleep factor=").append(sleepFactor).append(lineSeparator)
			.append("Fail ratio=").append(failRatio).append(lineSeparator)
			.append("Waiting mode=").ap