Use SparkContext hadoop configuration within RDD methods/closures, like foreachPartition

前端 未结 4 1293
慢半拍i
慢半拍i 2021-01-05 10:54

I am using Spark to read a bunch of files, elaborating on them and then saving all of them as a Sequence file. What I wanted, was to have 1 sequence file per partition, so I

4条回答
  •  广开言路
    2021-01-05 11:14

    You can serialize and deserialize the org.apache.hadoop.conf.Configuration using org.apache.spark.SerializableWritable.

    For example:

    import org.apache.spark.SerializableWritable
    
    ...
    
    val hadoopConf = spark.sparkContext.hadoopConfiguration
    // serialize here
    val serializedConf = new SerializableWritable(hadoopConf)
    
    
    // then access the conf by calling .value on serializedConf
    rdd.map(someFunction(serializedConf.value))
    
    

提交回复
热议问题