In Scala, if I have the following config:
id = 777
username = stephan
password = DG#%T@RH
The idea is to open a file, transform it into a s
You can configure this values in json file ( I have named it as config.json)as below
{
"id": "777",
"username": "stephan",
"password": "DG#%T@RH"
}
Now you can store this json file at hdfs location and read this file from hdfs location using spark in your scala and read your configuration values as below:
val configData = spark.read.option("multiline",true).json("/tmp/user/config.json")
val id = configData.select("id").collect()(0)
val username = configData.select("username").collect()(0)
val password = configData.select("password").collect()(0)
In the first line of the code you need to use option with parameter of multiline = true as your json file will have each value on new line. if you don't use that you will get error saying _corrupt_record : string
Best way would be to use a .conf
file and the ConfigFactory
instead of having to do all the file parsing by yourself:
import java.io.File
import com.typesafe.config.{ Config, ConfigFactory }
// this can be set into the JVM environment variables, you can easily find it on google
val configPath = System.getProperty("config.path")
val config = ConfigFactory.parseFile(new File(configPath + "myFile.conf"))
config.getString("username")
I usually use scalaz Validation
for the parseFile
operation in case the file it's not there, but you can simply use a try/catch
if you don't know how to use that.
If your Spark version is less than 2.2 then first convert your JSON file content in to JSON String i.e. convert your file content to single string and load it to HDFS location.
Sample JSON:
{
"planet" : "Earth",
"continent" : "Antarctica"
}
Convert to:
{ "planet" : "Earth", "continent" : "Antarctica"}
Next, to access data create a data frame:
val dataDF = spark.read.format("json").load("<HDFS location>")
val planet = dataDF.select("planet").collect(0).mkString("")
Hope this helps Spark 2.1 and less users.