Can Flink write results into multiple files (like Hadoop's MultipleOutputFormat)?

后端 未结 2 1060
时光说笑
时光说笑 2021-01-06 12:39

I\'m using Apache Flink\'s DataSet API. I want to implement a job that writes multiple results into different files.

How can I do that?

2条回答
  •  臣服心动
    2021-01-06 13:16

    You can use HadoopOutputFormat API in Flink like this:

    class IteblogMultipleTextOutputFormat[K, V] extends MultipleTextOutputFormat[K, V] {
    override def generateActualKey(key: K, value: V): K =
      NullWritable.get().asInstanceOf[K]
    
    override def generateFileNameForKeyValue(key: K, value: V, name: String): String =
      key.asInstanceOf[String]
    }
    

    and we can using IteblogMultipleTextOutputFormat as follow:

    val multipleTextOutputFormat = new IteblogMultipleTextOutputFormat[String, String]()
    val jc = new JobConf()
    FileOutputFormat.setOutputPath(jc, new Path("hdfs:///user/iteblog/"))
    val format = new HadoopOutputFormat[String, String](multipleTextOutputFormat,   jc)
    val batch = env.fromCollection(List(("A", "1"), ("A", "2"), ("A", "3"),
      ("B", "1"), ("B", "2"), ("C", "1"), ("D", "2")))
    batch.output(format)
    

    for more information you can see:http://www.iteblog.com/archives/1667

提交回复
热议问题