Flume-ng tail a file

后端 未结 4 1750
生来不讨喜
生来不讨喜 2021-01-07 09:10

I am trying to understand how to tail a file with flume-ng so that I can push the data into HDFS. In the first instance I have setup a simple conf file:

tail         


        
4条回答
  •  攒了一身酷
    2021-01-07 09:28

    Your config file looks fine. I used it in CDH4 and worked as you expected, all I did was change the log file location for the tail. I saw the output on the console. In my case new log data was being written continuously to the file I was tailing. The timestamps in your data make it look like this might not be the case in your example.

    Here is a more complete conf example more in line with what I think you are trying to do. It will tail the file and write a new HDFS file every 10 min or 10K records. Change the agent1.sources.source1.command to your tail command and change the agent1.sinks.sink1.hdfs.path and agent1.sinks.sink1.hdfs.filePrefix based on your HDFS config.

    # A single-node Flume configuration
    # uses exec and tail and will write a file every 10K records or every 10 min
    # Name the components on this agent
    agent1.sources = source1
    agent1.sinks = sink1
    agent1.channels = channel1
    
    # Describe/configure source1
    agent1.sources.source1.type = exec
    agent1.sources.source1.command = tail -f /home/cloudera/LogCreator/fortune_log.log
    
    # Describe sink1
    agent1.sinks.sink1.type = hdfs
    agent1.sinks.sink1.hdfs.path = hdfs://localhost/flume/logtest/
    agent1.sinks.sink1.hdfs.filePrefix = LogCreateTest
    # Number of seconds to wait before rolling current file (0 = never roll based on time interval)
    agent1.sinks.sink1.hdfs.rollInterval = 600
    # File size to trigger roll, in bytes (0: never roll based on file size) 
    agent1.sinks.sink1.hdfs.rollSize = 0
    #Number of events written to file before it rolled (0 = never roll based on number of events) 
    agent1.sinks.sink1.hdfs.rollCount = 10000
    # number of events written to file before it flushed to HDFS 
    agent1.sinks.sink1.hdfs.batchSize = 10000 
    agent1.sinks.sink1.hdfs.txnEventMax = 40000
    # -- Compression codec. one of following : gzip, bzip2, lzo, snappy
    # hdfs.codeC = gzip
    #format: currently SequenceFile, DataStream or CompressedStream
    #(1)DataStream will not compress output file and please don't set codeC
    #(2)CompressedStream requires set hdfs.codeC with an available codeC
    agent1.sinks.sink1.hdfs.fileType = DataStream 
    agent1.sinks.sink1.hdfs.maxOpenFiles=50
    # -- "Text" or "Writable"
    #hdfs.writeFormat
    agent1.sinks.sink1.hdfs.appendTimeout = 10000
    agent1.sinks.sink1.hdfs.callTimeout = 10000
    # Number of threads per HDFS sink for HDFS IO ops (open, write, etc.)
    agent1.sinks.sink1.hdfs.threadsPoolSize=100 
    # Number of threads per HDFS sink for scheduling timed file rolling
    agent1.sinks.sink1.hdfs.rollTimerPoolSize = 1 
    # hdfs.kerberosPrin--cipal Kerberos user principal for accessing secure HDFS
    # hdfs.kerberosKey--tab Kerberos keytab for accessing secure HDFS
    # hdfs.round false Should the timestamp be rounded down (if true, affects all time based escape sequences except %t)
    # hdfs.roundValue1 Rounded down to the highest multiple of this (in the unit configured using
    # hdfs.roundUnit), less than current time.
    # hdfs.roundUnit second The unit of the round down value - second, minute or hour.
    # serializer TEXT Other possible options include AVRO_EVENT or the fully-qualified class name of an implementation of the EventSerializer.Builder interface.
    # serializer.*
    
    
    # Use a channel which buffers events to a file
    # -- The component type name, needs to be FILE.
    agent1.channels.channel1.type = FILE 
    # checkpointDir ~/.flume/file-channel/checkpoint The directory where checkpoint file will be stored
    # dataDirs ~/.flume/file-channel/data The directory where log files will be stored
    # The maximum size of transaction supported by the channel
    agent1.channels.channel1.transactionCapacity = 1000000 
    # Amount of time (in millis) between checkpoints
    agent1.channels.channel1.checkpointInterval 30000
    # Max size (in bytes) of a single log file 
    agent1.channels.channel1.maxFileSize = 2146435071
    # Maximum capacity of the channel 
    agent1.channels.channel1.capacity 10000000 
    #keep-alive 3 Amount of time (in sec) to wait for a put operation
    #write-timeout 3 Amount of time (in sec) to wait for a write operation
    
    # Bind the source and sink to the channel
    agent1.sources.source1.channels = channel1
    agent1.sinks.sink1.channel = channel1
    

提交回复
热议问题