Can Flink write results into multiple files (like Hadoop's MultipleOutputFormat)?

后端 未结 2 1057
时光说笑
时光说笑 2021-01-06 12:39

I\'m using Apache Flink\'s DataSet API. I want to implement a job that writes multiple results into different files.

How can I do that?

2条回答
  •  离开以前
    2021-01-06 13:37

    You can add as many data sinks to a DataSet program as you need.

    For example in a program like this:

    ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    
    DataSet> data = env.readFromCsv(...);
    // apply MapFunction and emit
    data.map(new YourMapper()).writeToText("/foo/bar");
    // apply FilterFunction and emit
    data.filter(new YourFilter()).writeToCsv("/foo/bar2");
    

    You read a DataSet data from a CSV file. This data is given to two subsequent transformations:

    1. To a MapFunction and its result is written to a text file.
    2. To a FilterFunction and the non-filtered tuples are written to a CSV file.

    You can also have multiple data source and branch and merge data sets (using union, join, coGroup, cross, or broadcast sets) as you like.

提交回复
热议问题