flink常用基本算子

本文介绍了ApacheFlink中的三种基本转换算子:map、flatmap和filter。map用于一对一的数据转换,示例代码展示了如何提取Event对象的user属性。flatmap则用于数据的扁平化处理,例如将Event对象的user、url和timestamp分别输出。filter算子则用于过滤数据,只保留满足条件(如user为zhangsan)的事件。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

flink中常用的基本转换算子有:map,flatmap,filter等算子。其用法为:

1.map算子

        map算子是一一对应的映射关系算子,将一条数据转化为另一条数据。

        1.1数据准备

               首先定义一个实体类Event。

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import java.sql.Timestamp;

@Data
@AllArgsConstructor
@NoArgsConstructor
public class Event {
    public String user;
    public String url;
    public Long timestamp;

    @Override
    public String toString() {
        return "Event{" +
                "user='" + user + '\'' +
                ", url='" + url + '\'' +
                ", timestamp='" + new Timestamp(timestamp) + '\'' +
                '}';
    }
}

1.2.map算子基本用法

import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class TransformMapTest {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);
        DataStreamSource<Event> stream = env.fromElements(new Event("zhangsan", "./home", 1000L),
                new Event("lisi", ".fav", 2000L));
        SingleOutputStreamOperator<String> result = stream.map(new myMap());
        SingleOutputStreamOperator<String> result2 = stream.map(new MapFunction<Event, String>() {
            @Override
            public String map(Event event) throws Exception {
                return event.user;
            }
        });
        result.print("1");
        result2.print("2");
        env.execute();
    }
    public static class myMap implements MapFunction<Event,String>{

        @Override
        public String map(Event event) throws Exception {
            return event.user;
        }
    }
}


// 结果为:

1> zhangsan
2> zhangsan
1> lisi
2> lisi

2.flatmap算子

flatmap算子就是将数据进行一个拆散的扁平化处理

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.typeinfo.TypeHint;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;

public class TransformFlatMapTest {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);
        DataStreamSource<Event> stream = env.fromElements(new Event("zhangsan", "./home", 1000L),
                new Event("lisi", ".fav", 2000L));

        stream.flatMap(new myFlatMap()).print("1");
        stream.flatMap(new FlatMapFunction<Event, String>() {
            @Override
            public void flatMap(Event event, Collector<String> collector) throws Exception {
                collector.collect(event.user);
                collector.collect(event.timestamp.toString());
            }
        }).print("2");
        stream.flatMap( (Event event,Collector<String> out)->out.collect(event.user) ).
                returns(new TypeHint<String>() {}).print("3");
        env.execute();
    }
    public static class myFlatMap implements FlatMapFunction<Event,String>{
        @Override
        public void flatMap(Event event, Collector<String> clt) throws Exception {
            clt.collect(event.user);
            clt.collect(event.url);
            clt.collect(event.timestamp.toString());
        }
    }
}

//结果为
1> zhangsan
1> ./home
1> 1000
2> zhangsan
2> 1000
3> zhangsan
1> lisi
1> .fav
1> 2000
2> lisi
2> 2000
3> lisi

3.filter算子

filter算子是过滤数据,将符合条件的数据进行输出

import org.apache.flink.api.common.functions.FilterFunction;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class TransformFilterTest {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);
        DataStreamSource<Event> stream = env.fromElements(new Event("zhangsan", "./home", 1000L),
                new Event("lisi", ".fav", 2000L));
        SingleOutputStreamOperator<Event> result = stream.filter(new FilterFunction<Event>() {
            @Override
            public boolean filter(Event event) throws Exception {
                return event.user.equals("zhangsan");
            }
        });
        result.print("1");
        SingleOutputStreamOperator<Event> stream2 = stream.filter(new myFilter());
        stream2.print("2");
        stream.filter(data-> data.user.equals("zhangsan")).print();
        env.execute();
    }

    public static class myFilter implements FilterFunction<Event>{

        @Override
        public boolean filter(Event event) throws Exception {
            return event.user.equals("zhangsan");
        }
    }
}
//结果为
1> Event{user='zhangsan', url='./home', timestamp='1970-01-01 08:00:01.0'}
2> Event{user='zhangsan', url='./home', timestamp='1970-01-01 08:00:01.0'}
Event{user='zhangsan', url='./home', timestamp='1970-01-01 08:00:01.0'}

Flink算子是在流处理和批处理任务中用于转换、过滤和其他操作的核心组件。以下是 Flink 常见的算子及其对应的 Java 示例。 ### 1. `map()` `map()` 算子将每个输入元素通过用户定义的函数进行转换,并返回一个新的结果集合。 ```java DataStream<String> inputStream = env.fromElements("hello", "world"); DataStream<Integer> resultStream = inputStream.map(new MapFunction<String, Integer>() { @Override public Integer map(String value) throws Exception { return value.length(); } }); ``` ### 2. `filter()` `filter()` 算子用于从数据集中筛选出满足条件的数据。 ```java DataStream<Integer> inputNumbers = env.fromElements(1, 2, 3, 4); DataStream<Integer> evenNumbers = inputNumbers.filter(value -> value % 2 == 0); ``` ### 3. `keyBy()` `keyBy()` 算子基于某个键对数据集进行分组,以便后续的操作如聚合等。 ```java DataStream<Tuple2<String, Integer>> dataStream = ...; dataStream.keyBy(value -> value.f0) .sum(1); // 对 key 相同的数据求和 ``` ### 4. `flatMap()` `flatMap()` 类似于 `map()` ,但是它可以生成任意数量的结果元素 (包括零个或多个)。 ```java DataStream<String> lines = env.fromElements("hello world", "hi there"); DataStream<String> words = lines.flatMap(new FlatMapFunction<String, String>() { @Override public void flatMap(String line, Collector<String> out) throws Exception { for (String word : line.split("\\s+")) { out.collect(word); } } }); ``` ### 5. `window()` 窗口允许我们在无界流上执行有限计算,例如在一个固定的时间范围内统计某些指标。 ```java import org.apache.flink.streaming.api.windowing.time.Time; // 每隔5秒触发一次滚动窗口,在这之前收集所有数据 inputDatastream.keyBy(someKeySelector) .timeWindow(Time.seconds(5)) .apply(windowFunction); ``` ### 6. `reduce()` `reduce()` 可以应用于已按键划分后的分区数据流,在每一分区内部按照指定规则累积更新状态直到遇到下一个非空记录为止。 ```java DataStream<SensorReading> readings = ... readings.keyBy(SensorReading::getId) .reduce((reading1, reading2) -> new SensorReading( reading1.getId(), Math.max(reading1.getTemperature(), reading2.getTemperature()), System.currentTimeMillis()) ); ``` 以上只是部分常用Flink 数据变换算子;实际应用还会涉及到更复杂的模式匹配(`cep`)、连接 (`join`, `coGroup`)以及自定义用户函数等等。了解这些基本功能可以帮助您构建高效且强大的分布式实时数据分析应用程序!
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值