Flume HDFS sink: Remove timestamp from filename

試著忘記壹切 提交于 2020-08-06 15:12:50

问题


I have configured flume agent for my application, where source is Spooldir and sink is HDFS

I am able to collect files in hdfs.

agent configuration is:

agent.sources = src-1
agent.channels = c1
agent.sinks = k1

agent.sources.src-1.type = spooldir
agent.sources.src-1.channels = c1
agent.sources.src-1.spoolDir = /home/Documents/id/
agent.sources.src-1.deserializer=org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder
agent.sources.src-1.fileHeader=true
agent.channels.c1.type = file
agent.sources.src-1.basenameHeader=true
agent.sources.src-1.basenameHeaderKey=basename

agent.sinks.k1.type = hdfs
agent.sinks.k1.channel = c1
agent.sinks.k1.hdfs.path =hdfs://localhost:8020/user/flume/events/
agent.sinks.k1.hdfs.filePrefix = %{basename}
agent.sinks.k1.hdfs.fileHeader = true
agent.sinks.k1.hdfs.fileType = DataStream

I am having hdfs files as below format:

/flume/events/file1.txt.1411543838171 /flume/events/file2.txt.1411544272696

I want to know Could i remove timestamp(1411543838171) / unique number which is generated automatically for each event for file name?


回答1:


It doesn't seem to be possible to remove the timestamp just by using configuration. If you have a look at how HDFS Sink works you will find the following:

long counter = fileExtensionCounter.incrementAndGet();
String fullFileName = fileName + "." + counter;

Where fileExtensionCounter is fileExtensionCounter = new AtomicLong(clock.currentTimeMillis());

You can check the code for the sink here and here for the writer.

If what you want to do is put more events in a single file, then you can have a look at the sink properties

  • rollTime
  • rollSize
  • rollCount
  • batchSize


来源:https://stackoverflow.com/questions/33820163/flume-hdfs-sink-remove-timestamp-from-filename

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!