NingG +

不同OS环境下部署Flume Agent

分析

开篇扯一扯,Flume Agent要部署到不同的OS环境下,典型的代表:Win XP、Win Server 2008、Linux、Unix。Flume运行在JVM之上,正常情况下,只要安装JRE即可运行Flume Agent。查看Flume官方文档,安装Flume Agent时,系统要满足如下几个要求:

整体上,在windows下安装Flume-ng的解决方案,有几个信息源:

Windows XP

在Win XP下安装部署一个Flume Agent,同时利用Tail命令实时收集某一文件上追加的内容,简单说,分下面几步:

下载Flume Agent

下载地址:Flume官方下载地址

下载Window XP下的Tail命令

在StackOverflow上简要查了一下,UnxUtils,GNU utilities for Win32,可在Win32下实现tail命令,具体下载地址:UnxUtils官网

定制Windown XP下的bat启动脚本

在Linux下启动Flume,使用的是bin/flume-ng脚本,这个脚本需要bash shell环境的支持,而Windows下没有bash shell,这样是不是就没有办法在Windows下启动Flume了?仔细想一下,两点:

上面基本思路理清楚了,去官网查一下,看看有没有人在Windows XP下进行Flume Agent的部署,借鉴一下。找到如下几个参考来源:

具体编写之后的bin/flume-win.bat启动脚本如下:

::@echo off

::USAGE: 	apache-flume-1.5.0.1-bin>call bin/flume-win*.bat
::AUTHOER:	Ning Guo of CIB
::TIME:		2014/11/28  12:55

::set java home
set JAVA_HOME="D:\Program Files\Java\jdk1.7.0_67"

::set configuration file
set CONF_FILE=logToKafka.properties

::set agent name 
set AGENT_NAME=agent


::retrieve the parent directory
setlocal
for %%i in ("%~dp0..") do set "folder=%%~fi"
set FLUME_HOME="%folder%"


%JAVA_HOME%\bin\java.exe -Xms128m -Xmx512m -Dflume.monitoring.type=ganglia -Dflume.monitoring.hosts=168.7.2.165:8649 -Dlog4j.configuration=file:///%FLUME_HOME%\conf\log4j.properties -cp %FLUME_HOME%\lib\* org.apache.flume.node.Application -f %FLUME_HOME%\conf\%CONF_FILE% -n %AGENT_NAME%

pause

上述涉及flume的配置文件logToKafka.properties,其内容如下:

############################################################## COMPONENTS
# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per agent, 
# in this case called 'agent'

agent.sources = seqGenSrc
agent.channels = memoryChannel
agent.sinks = loggerSink


############################################################## SOURCES
# For each one of the sources, the type is defined


# Exec Source For Flume agent on Win XP(UnxUtils).
agent.sources.seqGenSrc.type = exec
agent.sources.seqGenSrc.command = E:\reference\svn-new-doc\flume\UnxUtils\usr\local\wbin\tail.exe --follow=name --retry E:/1.log
agent.sources.seqGenSrc.restart = true
agent.sources.seqGenSrc.restartThrottle = 1000
agent.sources.seqGenSrc.batchSize = 100
#agent.sources.seqGenSrc.charset = GBK


# Exec Source For Flume agent on Win Server 2008.
#agent.sources.seqGenSrc.type = exec
#agent.sources.seqGenSrc.command = get-content d:/flume/1.log -wait
#agent.sources.seqGenSrc.shell = powershell
#agent.sources.seqGenSrc.restart = true
#agent.sources.seqGenSrc.restartThrottle = 1000
#agent.sources.seqGenSrc.batchSize = 100
#agent.sources.seqGenSrc.charset = GBK


############################################################## SINKS

agent.sinks.loggerSink.type = com.thilinamb.flume.sink.KafkaSink 
#agent.sinks.loggerSink.topic = goodjob
agent.sinks.loggerSink.topic = good
#agent.sinks.loggerSink.charset = GBK
agent.sinks.loggerSink.kafka.metadata.broker.list = 168.7.1.67:9091,168.7.1.68:9091,168.7.1.69:9091,168.7.1.70:9091
agent.sinks.loggerSink.kafka.serializer.class = kafka.serializer.StringEncoder
agent.sinks.loggerSink.kafka.request.required.acks = 1


############################################################## CHANNELS
# Each channel's type is defined.
agent.channels.memoryChannel.type = memory
agent.channels.memoryChannel.capacity = 100000


############################################################## RELATIONS
# The channel can be defined as follows.
agent.sources.seqGenSrc.channels = memoryChannel

#Specify the channel the sink should use
agent.sinks.loggerSink.channel = memoryChannel

补充说明:此次启动的Flume Agent通过本地tail命令收集日志内容,并通过KafkaSink将信息送入Kafka中,具体涉及几点:

Windows Server 2008

Windows Server 2008 与Windows XP基本相同,只需要调整一下logToKafka.properties脚本中sources部分,将command由tail(UnxUtils)替换为get-content(powershell),因为UnxUtils下的tail命令,在Windows Server 2008环境下,在Flume的source中时,无法捕获日志内容。(很奇怪,原因不明)

原因定位:在windows下,tail -f命令无法使用的原因,初步确定是因为,tail -f进程没有及时向Flume agent进程返回数据,而是在tail命令执行结束时,才将所有的内容一起返回;具体,可以监控tail命令,会出现内容已经发到Flume agent的现象。

Windows 7

vdisk中flume目录下,有专门文档;

Top