Reading Flume spoolDir in parallel

旧街凉风 提交于 2019-12-22 10:51:08

问题


Since I'm not allowed to set up Flume on prod servers, I have to download the logs, put them in a Flume spoolDir and have a sink to consume from the channel and write to Cassandra. Everything is working fine.

However, as I have a lot of log files in the spoolDir, and the current setup is only processing 1 file at a time, it's taking a while. I want to be able to process many files concurrently. One way I thought of is to use the spoolDir but distribute the files into 5-10 different directories, and define multiple sources/channels/sinks, but this is a bit clumsy. Is there a better way to achieve this?

Thanks


回答1:


Just for the record, this has been answered in Flume's mailing list:

Hari Shreedharan wrote:

Unfortunately, no. The spoolDir source was kept single-threaded so that deserializer implementations can be kept simple. The approach with mutliple spoolDir sources is the correct one, though they can all write to the same channel(s) - so you'd need only a larger number of sources, they can all share the same channel(s) and you don't need more sinks unless you want to pull data out faster.

http://mail-archives.apache.org/mod_mbox/flume-user/201409.mbox/browser



来源:https://stackoverflow.com/questions/25875574/reading-flume-spooldir-in-parallel

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!