Setting textinputformat.record.delimiter in spark

前端 未结 1 1331
忘了有多久
忘了有多久 2020-12-01 15:26

In Spark, it is possible to set some hadoop configuration settings like, e.g.

System.setProperty(\"spark.hadoop.dfs.replication\", \"1\")

T

相关标签:
1条回答
  • 2020-12-01 15:52

    I got this working with plain uncompressed files with the below function.

    import org.apache.hadoop.io.LongWritable
    import org.apache.hadoop.io.Text
    import org.apache.hadoop.conf.Configuration
    import org.apache.hadoop.mapreduce.lib.input.TextInputFormat
    
    def nlFile(path: String) = {
        val conf = new Configuration
        conf.set("textinputformat.record.delimiter", "\n")
        sc.newAPIHadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], classOf[Text], conf)
          .map(_._2.toString)
    }
    
    0 讨论(0)
提交回复
热议问题