发表新帖

发表新帖

Flume NG and HDFS

前端未结

关注

 1  1864

渐次进展 2021-01-13 04:03

I am very new to hadoop , so please excuse the dumb questions.

I have the following knowledge Best usecase of Hadoop is large files thus helping in efficiency while

1条回答

囚心锁ツ (楼主)

2021-01-13 04:34
Flume writes to HDFS by means of HDFS sink. When Flume starts and begins to receive events, the sink opens new file and writes events into it. At some point previously opened file should be closed, and until then data in the current block being written is not visible to other redaers.

As described in the documentation, Flume HDFS sink has several file closing strategies:
- each N seconds (specified by rollInterval option)
- after writing N bytes (rollSize option)
- after writing N received events (rollCount option)
- after N seconds of inactivity (idleTimeout option)
So, to your questions:

a) Flume writes events to currently opened file until it is closed (and new file opened).

b) Append is allowed in HDFS, but Flume does not use it. After file is closed, Flume does not append to it any data.

c) To hide currently opened file from mapreduce application use inUsePrefix option - all files with name that starts with . is not visible to MR jobs.
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题