I\'m using Apache Flink\'s DataSet API. I want to implement a job that writes multiple results into different files.
How can I do that?
You can add as many data sinks to a DataSet
program as you need.
For example in a program like this:
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet> data = env.readFromCsv(...);
// apply MapFunction and emit
data.map(new YourMapper()).writeToText("/foo/bar");
// apply FilterFunction and emit
data.filter(new YourFilter()).writeToCsv("/foo/bar2");
You read a DataSet
data
from a CSV file. This data
is given to two subsequent transformations:
MapFunction
and its result is written to a text file.FilterFunction
and the non-filtered tuples are written to a CSV file.You can also have multiple data source and branch and merge data sets (using union
, join
, coGroup
, cross
, or broadcast sets) as you like.