Firstly, I am completely new to scala and spark Although bit famailiar with pyspark. I am working with external json file which is pretty huge and I am not allowed to conver
i used to parse json in scala with this kind of method :
/** ---------------------------------------
* Example of method to parse simple json
{
"fields": [
{
"field1": "value",
"field2": "value",
"field3": "value"
}
]
}*/
import scala.io.Source
import scala.util.parsing.json._
case class outputData(field1 : String, field2: String, field3 : String)
def singleMapJsonParser(JsonDataFile : String) : List[outputData] = {
val JsonData : String = Source.fromFile(JsonDataFile).getLines.mkString
val jsonFormatData = JSON.parseFull(JsonData).map{
case json : Map[String, List[Map[String,String]]] =>
json("fields").map(v => outputData(v("field1"),v("field2"),v("field3")))
}.get
jsonFormatData
}
Then you just have to call your sparkContext to transform le List[Class] output to RDD