Setting textinputformat.record.delimiter in spark

前端未结

关注

 1  1331

In Spark, it is possible to set some hadoop configuration settings like, e.g.

System.setProperty(\"spark.hadoop.dfs.replication\", \"1\")

相关标签:

1条回答

北荒

2020-12-01 15:52

I got this working with plain uncompressed files with the below function.

import org.apache.hadoop.io.LongWritable
import org.apache.hadoop.io.Text
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat

def nlFile(path: String) = {
    val conf = new Configuration
    conf.set("textinputformat.record.delimiter", "\n")
    sc.newAPIHadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], classOf[Text], conf)
      .map(_._2.toString)
}

0 讨论(0)