how to convert json string to dataframe on spark

前端 未结 7 866
臣服心动
臣服心动 2020-11-27 15:39

I want to convert string variable below to dataframe on spark.

val jsonStr = \"{ \"metadata\": { \"key\": 84896, \"value\": 54 }}\"

I know

相关标签:
7条回答
  • 2020-11-27 15:55

    To convert list of json Strings into DataFrame in Spark 2.2 =>

    val spark = SparkSession
              .builder()
              .master("local")
              .appName("Test")
              .getOrCreate()
    
    var strList = List.empty[String]
    var jsonString1 = """{"ID" : "111","NAME":"Arkay","LOC":"Pune"}"""
    var jsonString2 = """{"ID" : "222","NAME":"DineshS","LOC":"PCMC"}"""
    strList = strList :+ jsonString1
    strList = strList :+ jsonString2
    
    val rddData = spark.sparkContext.parallelize(strList)
    resultDF = spark.read.json(rddData)
    resultDF.show()
    

    Result:

    +---+----+-------+
    | ID| LOC|   NAME|
    +---+----+-------+
    |111|Pune|  Arkay|
    |222|PCMC|DineshS|
    +---+----+-------+
    
    0 讨论(0)
  • 2020-11-27 15:57

    Since the function for reading JSON from an RDD got deprecated in Spark 2.2, this would be another option:

    val jsonStr = """{ "metadata": { "key": 84896, "value": 54 }}"""
    import spark.implicits._ // spark is your SparkSession object
    val df = spark.read.json(Seq(jsonStr).toDS)
    
    0 讨论(0)
  • 2020-11-27 16:06
    simple_json = '{"results":[{"a":1,"b":2,"c":"name"},{"a":2,"b":5,"c":"foo"}]}'
    rddjson = sc.parallelize([simple_json])
    df = sqlContext.read.json(rddjson)
    

    The reference to the answer is https://stackoverflow.com/a/49399359/2187751

    0 讨论(0)
  • 2020-11-27 16:07

    Here is an example how to convert Json string to Dataframe in Java (Spark 2.2+):

    String str1 = "{\"_id\":\"123\",\"ITEM\":\"Item 1\",\"CUSTOMER\":\"Billy\",\"AMOUNT\":285.2}";
    String str2 = "{\"_id\":\"124\",\"ITEM\":\"Item 2\",\"CUSTOMER\":\"Sam\",\"AMOUNT\":245.85}";
    List<String> jsonList = new ArrayList<>();
    jsonList.add(str1);
    jsonList.add(str2);
    SparkContext sparkContext = new SparkContext(new SparkConf()
            .setAppName("myApp").setMaster("local"));
    JavaSparkContext javaSparkContext = new JavaSparkContext(sparkContext);
    SQLContext sqlContext = new SQLContext(sparkContext);
    JavaRDD<String> javaRdd = javaSparkContext.parallelize(jsonList);
    Dataset<Row> data = sqlContext.read().json(javaRdd);
    data.show();
    

    Here is the result:

    +------+--------+------+---+
    |AMOUNT|CUSTOMER|  ITEM|_id|
    +------+--------+------+---+
    | 285.2|   Billy|Item 1|123|
    |245.85|     Sam|Item 2|124|
    +------+--------+------+---+
    
    0 讨论(0)
  • 2020-11-27 16:08

    For Spark 2.2+:

    import spark.implicits._
    val jsonStr = """{ "metadata": { "key": 84896, "value": 54 }}"""
    val df = spark.read.json(Seq(jsonStr).toDS)
    

    For Spark 2.1.x:

    val events = sc.parallelize("""{"action":"create","timestamp":"2016-01-07T00:01:17Z"}""" :: Nil)    
    val df = sqlContext.read.json(events)
    

    Hint: this is using sqlContext.read.json(jsonRDD: RDD[Stirng]) overload. There is also sqlContext.read.json(path: String) where it reads a Json file directly.

    For older versions:

    val jsonStr = """{ "metadata": { "key": 84896, "value": 54 }}"""
    val rdd = sc.parallelize(Seq(jsonStr))
    val df = sqlContext.read.json(rdd)
    
    0 讨论(0)
  • 2020-11-27 16:09

    There will be some error in some case like Illegal Patter component : XXX so for that you need to add .option with timestamp in spark.read so updated code will be.

    val spark = SparkSession
              .builder()
              .master("local")
              .appName("Test")
              .getOrCreate()
    import spark.implicits._
    val jsonStr = """{ "metadata": { "key": 84896, "value": 54 }}"""
    val df = spark.read.option("timestampFormat", "yyyy/MM/dd HH:mm:ss ZZ").json(Seq(jsonStr).toDS)
    df.show()
    
    0 讨论(0)
提交回复
热议问题