Using an HDFS Sink and rollInterval in Flume-ng to batch up 90 seconds of log information

前端未结
关注
 2  1736
轮回少年 2021-02-06 16:04
I am trying to use Flume-ng to grab 90 seconds of log information and put it into a file in HDFS. I have flume working to look at the log file via an exec and tail however it i
2条回答
被撕碎了的回忆 (楼主)
2021-02-06 16:42
A rewrite of the config file specifying a more complete selection of parameters did the trick. This example will write after 10k records or 10 min which ever comes first. In addition I changed from a memory channel to a file channel to aid in reliability on the data flow.
agent1.sources = source1
agent1.sinks = sink1
agent1.channels = channel1

# Describe/configure source1                                                                                                                                                                                                                 
agent1.sources.source1.type = exec
agent1.sources.source1.command = tail -f /home/cloudera/LogCreator/fortune_log.log

# Describe sink1                                                                                                                                                                                                                             
agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path = hdfs://localhost/flume/logtest/
agent1.sinks.sink1.hdfs.filePrefix = LogCreateTest
# Number of seconds to wait before rolling current file (0 = never roll based on time interval)                                                                                                                                              
agent1.sinks.sink1.hdfs.rollInterval = 600
# File size to trigger roll, in bytes (0: never roll based on file size)                                                                                                                                                                     
agent1.sinks.sink1.hdfs.rollSize = 0
#Number of events written to file before it rolled (0 = never roll based on number of events)                                                                                                                                                
agent1.sinks.sink1.hdfs.rollCount = 10000
# number of events written to file before it flushed to HDFS                                                                                                                                                                                 
agent1.sinks.sink1.hdfs.batchSize = 10000
agent1.sinks.sink1.hdfs.txnEventMax = 40000
# -- Compression codec. one of following : gzip, bzip2, lzo, snappy                                                                                                                                                                          
# hdfs.codeC = gzip                                                                                                                                                                                                                          
#format: currently SequenceFile, DataStream or CompressedStream                                                                                                                                                                              
#(1)DataStream will not compress output file and please don't set codeC                                                                                                                                                                      
#(2)CompressedStream requires set hdfs.codeC with an available codeC                                                                                                                                                                         
agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.sinks.sink1.hdfs.maxOpenFiles=50
# -- "Text" or "Writable"                                                                                                                                                                                                                    
#hdfs.writeFormat                                                                                                                                                                                                                            
agent1.sinks.sink1.hdfs.appendTimeout = 10000
agent1.sinks.sink1.hdfs.callTimeout = 10000
# Number of threads per HDFS sink for HDFS IO ops (open, write, etc.)                                                                                                                                                                        
agent1.sinks.sink1.hdfs.threadsPoolSize=100
# Number of threads per HDFS sink for scheduling timed file rolling                                                                                                                                                                          
agent1.sinks.sink1.hdfs.rollTimerPoolSize = 1
# hdfs.kerberosPrin--cipal Kerberos user principal for accessing secure HDFS                                                                                                                                                                 
# hdfs.kerberosKey--tab Kerberos keytab for accessing secure HDFS                                                                                                                                                                            
# hdfs.round false Should the timestamp be rounded down (if true, affects all time based escape sequences except %t)                                                                                                                         
# hdfs.roundValue1 Rounded down to the highest multiple of this (in the unit configured using                                                                                                                                                
# hdfs.roundUnit), less than current time.                                                                                                                                                                                                   
# hdfs.roundUnit second The unit of the round down value - second, minute or hour.                                                                                                                                                           
# serializer TEXT Other possible options include AVRO_EVENT or the fully-qualified class name of an implementation of the EventSerializer.Builder interface.                                                                                 
# serializer.*                                                                                                                                                                                                                               


# Use a channel which buffers events to a file                                                                                                                                                                                               
# -- The component type name, needs to be FILE.                                                                                                                                                                                              
agent1.channels.channel1.type = FILE
# checkpointDir ~/.flume/file-channel/checkpoint The directory where checkpoint file will be stored                                                                                                                                          
# dataDirs ~/.flume/file-channel/data The directory where log files will be stored                                                                                                                                                           
# The maximum size of transaction supported by the channel                                                                                                                                                                                   
agent1.channels.channel1.transactionCapacity = 1000000
# Amount of time (in millis) between checkpoints                                                                                                                                                                                             
agent1.channels.channel1.checkpointInterval 30000
# Max size (in bytes) of a single log file                                                                                                                                                                                                   
agent1.channels.channel1.maxFileSize = 2146435071
# Maximum capacity of the channel                                                                                                                                                                                                            
agent1.channels.channel1.capacity 10000000
#keep-alive 3 Amount of time (in sec) to wait for a put operation                                                                                                                                                                            
#write-timeout 3 Amount of time (in sec) to wait for a write operation                                                                                                                                                                       

# Bind the source and sink to the channel                                                                                                                                                                                                    
agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1
0 讨论(0)
查看其它2个回答