Can Flink write results into multiple files (like Hadoop's MultipleOutputFormat)?

后端未结

关注

 2  1060

时光说笑 2021-01-06 12:39

I\'m using Apache Flink\'s DataSet API. I want to implement a job that writes multiple results into different files.

How can I do that?

2条回答

臣服心动 (楼主)

2021-01-06 13:16

You can use HadoopOutputFormat API in Flink like this:

class IteblogMultipleTextOutputFormat[K, V] extends MultipleTextOutputFormat[K, V] {
override def generateActualKey(key: K, value: V): K =
  NullWritable.get().asInstanceOf[K]

override def generateFileNameForKeyValue(key: K, value: V, name: String): String =
  key.asInstanceOf[String]
}

and we can using IteblogMultipleTextOutputFormat as follow:

val multipleTextOutputFormat = new IteblogMultipleTextOutputFormat[String, String]()
val jc = new JobConf()
FileOutputFormat.setOutputPath(jc, new Path("hdfs:///user/iteblog/"))
val format = new HadoopOutputFormat[String, String](multipleTextOutputFormat,   jc)
val batch = env.fromCollection(List(("A", "1"), ("A", "2"), ("A", "3"),
  ("B", "1"), ("B", "2"), ("C", "1"), ("D", "2")))
batch.output(format)

for more information you can see:http://www.iteblog.com/archives/1667

0 讨论(0)

查看其它2个回答