Flume - Can an entire file be considered an event in Flume?

前端 未结 1 2015
逝去的感伤
逝去的感伤 2021-02-06 17:03

I have a use case where I need to ingest files from a directory into HDFS. As a POC, I used simple Directory Spooling in Flume where I specified the source, sink and channel and

1条回答
  •  暖寄归人
    2021-02-06 17:46

    For starters, flume doesn't work on files as such, but on a thing called events. Events are Avro structures which can contain anything, usually a line, but in your case it might be an entire file.

    An interceptor gives you the ability to extract information from your event and add that to that event's headers. The latter can be used to configure a traget directory structure.

    In your specific case, you would want to code a parser that analyses the content of you event and sets a header value, for instance sub path:

    if (line.contains("Address")) {
        event.getHeaders().put("subpath", "address");
    else if (line.contains("ID")) {
        event.getHeaders().put("subpath", "id");
    }
    

    You can then reference that in your hdfs-sink confirguration as follows:

    hdfs-a1.sinks.hdfs-sink.hdfs.path = hdfs://cluster/path/%{subpath}
    

    As to your question whether multiple files can constitute an event: yes, that's possible, but not with the spool source. You would have to implement a client class which speaks to a configured Avro source. You would have to pipe your files into an event and send that off. You could then also set the headers there instead of using an interceptor.

    0 讨论(0)
提交回复
热议问题