I\'m using Apache Flink\'s DataSet API. I want to implement a job that writes multiple results into different files.
How can I do that?
You can use HadoopOutputFormat
API in Flink like this:
class IteblogMultipleTextOutputFormat[K, V] extends MultipleTextOutputFormat[K, V] {
override def generateActualKey(key: K, value: V): K =
NullWritable.get().asInstanceOf[K]
override def generateFileNameForKeyValue(key: K, value: V, name: String): String =
key.asInstanceOf[String]
}
and we can using IteblogMultipleTextOutputFormat
as follow:
val multipleTextOutputFormat = new IteblogMultipleTextOutputFormat[String, String]()
val jc = new JobConf()
FileOutputFormat.setOutputPath(jc, new Path("hdfs:///user/iteblog/"))
val format = new HadoopOutputFormat[String, String](multipleTextOutputFormat, jc)
val batch = env.fromCollection(List(("A", "1"), ("A", "2"), ("A", "3"),
("B", "1"), ("B", "2"), ("C", "1"), ("D", "2")))
batch.output(format)
for more information you can see:http://www.iteblog.com/archives/1667