flume中3大高级组件，Flume Interceptors：拦截器，Flume Channel Selectors ：选择器，Flume Sink Processors（sink的处理器）

本文详细介绍了Flume的三大高级组件：拦截器（如时间拦截器、主机名拦截器和正则过滤拦截器）、选择器（包括复制选择器和复用选择器）以及处理器（故障转移处理器和负载均衡处理器）。通过这些组件，可以实现数据过滤、通道选择策略以及Sink的故障处理和负载分担。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

flume中3大高级组件

Flume Interceptors：拦截器，与Spring中拦截器是类似

功能：通过拦截器对每条数据进行过滤护着包装

Timestamp Interceptor：时间拦截器

在每一个event的头部添加一个KeyValue

key: timestamp

value:当前封装event的时间


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per a1, 
# in this case called 'a1'

a1.sources = s1
a1.channels = c1 
a1.sinks = k1 

# defined sources
#如果是自己编译的类，这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command= tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c

a1.sources.s1.interceptors=i1
a1.sources.s1.interceptors.i1.type=timestamp






#define channel  
a1.channels.c1.type=memory

#设置检查点,  记录相关传输的信息,比如取了多少event

#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100


#defined sinks
a1.sinks.k1.type=logger

#bond
a1.sources.s1.channels = c1 
a1.sinks.k1.channel = c1

Host Interceptor：主机名拦截器

在每一个event的头部添加一个KeyValue

key: host

value:当前封装event所在机器的主机名


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per a1, 
# in this case called 'a1'

a1.sources = s1
a1.channels = c1 
a1.sinks = k1 

# defined sources
#如果是自己编译的类，这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command= tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c

a1.sources.s1.interceptors=i1
a1.sources.s1.interceptors.i1.type  =  host 
a1.sources.s1.interceptors.i1.hostHeader  =  hostname




#define channel  
a1.channels.c1.type=memory

#设置检查点,  记录相关传输的信息,比如取了多少event

#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100


#defined sinks
a1.sinks.k1.type=logger

#bond
a1.sources.s1.channels = c1 
a1.sinks.k1.channel = c1

Static Interceptor：自定义拦截器


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per a1, 
# in this case called 'a1'

a1.sources = s1
a1.channels = c1 
a1.sinks = k1 

# defined sources
#如果是自己编译的类，这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command= tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c

a1.sources.s1.interceptors=i1
a1.sources.s1.interceptors.i1.type=static
a1.sources.s1.interceptors.i1.key=tttt
a1.sources.s1.interceptors.i1.value=sgl








#define channel  
a1.channels.c1.type=memory

#设置检查点,  记录相关传输的信息,比如取了多少event

#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100


#defined sinks
a1.sinks.k1.type=logger

#bond
a1.sources.s1.channels = c1 
a1.sinks.k1.channel = c1

Regex Filtering Interceptor通过自定义正则表达式，实现对数据过滤

符号该正则，该条数据才会被留下

作业：通过时间拦截器和正则拦截器对数据进行过滤

1,2,3,4

{4,5,6,7}

4,7,8,8

{5,3,2,2}

只采集带括号的行，


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per a1, 
# in this case called 'a1'

a1.sources = s1
a1.channels = c1 
a1.sinks = k1 

# defined sources
#如果是自己编译的类，这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command=tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c

a1.sources.s1.interceptors=i1 i2
a1.sources.s1.interceptors.i1.type=timestamp
a1.sources.s1.interceptors.i2.type=regex_filter
a1.sources.s1.interceptors.i2.regex=\\{.*\\} 




#define channel  
a1.channels.c1.type=memory

#设置检查点,  记录相关传输的信息,比如取了多少event

#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100


#defined sinks
a1.sinks.k1.type=logger

#bond
a1.sources.s1.channels = c1 
a1.sinks.k1.channel = c1

Flume Channel Selectors

selector.type根据该值确定功能

Replicating Channel Selector（默认）

source将每条数据发给每一个channel

source将数据发了多份


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per a1, 
# in this case called 'a1'

a1.sources = s1
a1.channels = c1 c2 c3 
a1.sinks = k1 k2 k3

# defined sources
#如果是自己编译的类，这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command= tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c

a1.sources.s1.selector.type = replicating
a1.sources.s1.channels = c1 c2 c3
a1.sources.s1.selector.optional = c3




#define channel  
a1.channels.c1.type=memory
a1.channels.c2.type=memory
a1.channels.c3.type=memory

#设置检查点,  记录相关传输的信息,比如取了多少event

#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100


#define sinks
a1.sinks.k1.type = logger
a1.sinks.k2.type = logger
a1.sinks.k3.type = logger


#bond

a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2
a1.sinks.k3.channel = c3

Multiplexing Channel Selector

source选择性的将数据发送给channel


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per a1, 
# in this case called 'a1'

a1.sources = s1
a1.channels = c1 c2 c3 
a1.sinks = k1 k2 k3

# defined sources
#如果是自己编译的类，这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command= tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c

a1.sources.s1.channels = c1 c2 c3
a1.sources.s1.selector.type  =  multiplexing 
a1.sources.s1.selector.header  =  state 
a1.sources.s1.selector.mapping.CZ  =  c1 
a1.sources.s1.selector.mapping.US  =  c2 
a1.sources.s1.selector.default  =  c3




#define channel  
a1.channels.c1.type=memory
a1.channels.c2.type=memory
a1.channels.c3.type=memory

#设置检查点,  记录相关传输的信息,比如取了多少event

#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100


#define sinks

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path=/flume/selector1/hhh1
a1.sinks.k1.hdfs.useLocalTimeStamp=true

#设置文件类型和写的格式
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text

a1.sinks.k2.type = hdfs
a1.sinks.k2.hdfs.path=/flume/selector1/hhh2
a1.sinks.k2.hdfs.useLocalTimeStamp=true

#设置文件类型和写的格式
a1.sinks.k2.hdfs.fileType=DataStream
a1.sinks.k2.hdfs.writeFormat=Text


a1.sinks.k3.type = hdfs
a1.sinks.k3.hdfs.path=/flume/selector1/hhh3
a1.sinks.k3.hdfs.useLocalTimeStamp=true

#设置文件类型和写的格式
a1.sinks.k3.hdfs.fileType=DataStream
a1.sinks.k3.hdfs.writeFormat=Text


#bond

a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2
a1.sinks.k3.channel = c3