flume中3大高级组件,Flume Interceptors:拦截器,Flume Channel Selectors :选择器 ,Flume Sink Processors(sink的处理器)

本文详细介绍了Flume的三大高级组件:拦截器(如时间拦截器、主机名拦截器和正则过滤拦截器)、选择器(包括复制选择器和复用选择器)以及处理器(故障转移处理器和负载均衡处理器)。通过这些组件,可以实现数据过滤、通道选择策略以及Sink的故障处理和负载分担。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

flume中3大高级组件

Flume Interceptors:拦截器,与Spring中拦截器是类似

功能:通过拦截器对每条数据进行过滤护着包装

Timestamp Interceptor:时间拦截器

在每一个event的头部添加一个KeyValue

       key: timestamp

       value:当前封装event的时间


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per a1, 
# in this case called 'a1'

a1.sources = s1
a1.channels = c1 
a1.sinks = k1 

# defined sources
#如果是自己编译的类,这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command= tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c

a1.sources.s1.interceptors=i1
a1.sources.s1.interceptors.i1.type=timestamp






#define channel  
a1.channels.c1.type=memory

#设置检查点,  记录相关传输的信息,比如取了多少event

#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100


#defined sinks
a1.sinks.k1.type=logger

#bond
a1.sources.s1.channels = c1 
a1.sinks.k1.channel = c1




 

Host Interceptor:主机名拦截器

       在每一个event的头部添加一个KeyValue

       key: host

       value:当前封装event所在机器的主机名


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per a1, 
# in this case called 'a1'

a1.sources = s1
a1.channels = c1 
a1.sinks = k1 

# defined sources
#如果是自己编译的类,这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command= tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c

a1.sources.s1.interceptors=i1
a1.sources.s1.interceptors.i1.type  =  host 
a1.sources.s1.interceptors.i1.hostHeader  =  hostname




#define channel  
a1.channels.c1.type=memory

#设置检查点,  记录相关传输的信息,比如取了多少event

#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100


#defined sinks
a1.sinks.k1.type=logger

#bond
a1.sources.s1.channels = c1 
a1.sinks.k1.channel = c1




 

 

Static Interceptor:自定义拦截器


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per a1, 
# in this case called 'a1'

a1.sources = s1
a1.channels = c1 
a1.sinks = k1 

# defined sources
#如果是自己编译的类,这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command= tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c

a1.sources.s1.interceptors=i1
a1.sources.s1.interceptors.i1.type=static
a1.sources.s1.interceptors.i1.key=tttt
a1.sources.s1.interceptors.i1.value=sgl








#define channel  
a1.channels.c1.type=memory

#设置检查点,  记录相关传输的信息,比如取了多少event

#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100


#defined sinks
a1.sinks.k1.type=logger

#bond
a1.sources.s1.channels = c1 
a1.sinks.k1.channel = c1




 

Regex Filtering Interceptor通过自定义正则表达式,实现对数据过滤

       符号该正则,该条数据才会被留下

 

作业:通过时间拦截器和正则拦截器对数据进行过滤

1,2,3,4

{4,5,6,7}

4,7,8,8

{5,3,2,2}

只采集带括号的行,

 


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per a1, 
# in this case called 'a1'

a1.sources = s1
a1.channels = c1 
a1.sinks = k1 

# defined sources
#如果是自己编译的类,这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command=tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c

a1.sources.s1.interceptors=i1 i2
a1.sources.s1.interceptors.i1.type=timestamp
a1.sources.s1.interceptors.i2.type=regex_filter
a1.sources.s1.interceptors.i2.regex=\\{.*\\} 




#define channel  
a1.channels.c1.type=memory

#设置检查点,  记录相关传输的信息,比如取了多少event

#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100


#defined sinks
a1.sinks.k1.type=logger

#bond
a1.sources.s1.channels = c1 
a1.sinks.k1.channel = c1




Flume Channel Selectors

selector.type根据该值确定功能

Replicating Channel Selector(默认)

       source将每条数据发给每一个channel

       source将数据发了多份


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per a1, 
# in this case called 'a1'

a1.sources = s1
a1.channels = c1 c2 c3 
a1.sinks = k1 k2 k3

# defined sources
#如果是自己编译的类,这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command= tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c

a1.sources.s1.selector.type = replicating
a1.sources.s1.channels = c1 c2 c3
a1.sources.s1.selector.optional = c3




#define channel  
a1.channels.c1.type=memory
a1.channels.c2.type=memory
a1.channels.c3.type=memory

#设置检查点,  记录相关传输的信息,比如取了多少event

#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100


#define sinks
a1.sinks.k1.type = logger
a1.sinks.k2.type = logger
a1.sinks.k3.type = logger


#bond

a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2
a1.sinks.k3.channel = c3




Multiplexing Channel Selector

       source选择性的将数据发送给channel


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per a1, 
# in this case called 'a1'

a1.sources = s1
a1.channels = c1 c2 c3 
a1.sinks = k1 k2 k3

# defined sources
#如果是自己编译的类,这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command= tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c

a1.sources.s1.channels = c1 c2 c3
a1.sources.s1.selector.type  =  multiplexing 
a1.sources.s1.selector.header  =  state 
a1.sources.s1.selector.mapping.CZ  =  c1 
a1.sources.s1.selector.mapping.US  =  c2 
a1.sources.s1.selector.default  =  c3




#define channel  
a1.channels.c1.type=memory
a1.channels.c2.type=memory
a1.channels.c3.type=memory

#设置检查点,  记录相关传输的信息,比如取了多少event

#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100


#define sinks

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path=/flume/selector1/hhh1
a1.sinks.k1.hdfs.useLocalTimeStamp=true

#设置文件类型和写的格式
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text

a1.sinks.k2.type = hdfs
a1.sinks.k2.hdfs.path=/flume/selector1/hhh2
a1.sinks.k2.hdfs.useLocalTimeStamp=true

#设置文件类型和写的格式
a1.sinks.k2.hdfs.fileType=DataStream
a1.sinks.k2.hdfs.writeFormat=Text


a1.sinks.k3.type = hdfs
a1.sinks.k3.hdfs.path=/flume/selector1/hhh3
a1.sinks.k3.hdfs.useLocalTimeStamp=true

#设置文件类型和写的格式
a1.sinks.k3.hdfs.fileType=DataStream
a1.sinks.k3.hdfs.writeFormat=Text


#bond

a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2
a1.sinks.k3.channel = c3




Flume Sink Processors(sink的处理器)

Failover Sink Processor(故障转移)

processor.type= failover

启动了多个,但是工作的只有一个,只有active状态进程死掉,其他才可能接替工作。

那么多个有多个sink到底谁先工作,根据权重来,谁的权重高,谁先干活

一般故障转移的话,2sink的类型不一样(HDFS sink ,file sink

      比如往HDFS写数据,HDFS宕机了,数据不丢失,往文件里写

Load balancing Sink Processor(负载均衡)

processor.type=load_balance

processor.selector = round_robin(轮询)|random(随机)

 

 

负载均衡与故障转移,只能实现一个,不能同时实现,往往选择负载均衡

 

 

 

 

 

 


 

 

 

 

 

 

 

 

 

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值