How to pass configuration file that hosted in HDFS to Spark Application?

心已入冬 提交于 2020-06-29 08:03:05

问题


I'm working with Spark Structured Streaming. Also, I'm working with Scala. I want to pass config file to my spark application. This configuration file hosted in HDFS. For example;

spark_job.conf (HOCON)

spark {
  appName: "",
  master: "",
  shuffle.size: 4 
  etc..
}

kafkaSource {
  servers: "",
  topic: "",
  etc..
}

redisSink {
  host: "",
  port: 999,
  timeout: 2000,
  checkpointLocation: "hdfs location",
  etc..
}

How can I pass it to Spark Application? How can I read this file(hosted HDFS) in Spark?


回答1:


You can read the HOCON config from HDFS in the following way:

import com.typesafe.config.{Cofig, ConfigFactory}
import java.io.InputStreamReader
import java.net.URI
import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.hadoop.conf.Configuration

val hdfs: FileSystem = FileSystem.get(new URI("hdfs://"), new Configuration())

val reader = new InputStreamReader(hdfs.open(new Path("/path/to/conf/on/hdfs")))

val conf: Config = ConfigFactory.parseReader(reader)

You can also pass the URI of your namenode to the FileSystem.get(new URI("your_uri_here")) and the code will still read your configuration.



来源:https://stackoverflow.com/questions/56021255/how-to-pass-configuration-file-that-hosted-in-hdfs-to-spark-application

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!