Spark JSON text field to RDD

后端 未结 1 436
小蘑菇
小蘑菇 2021-02-04 16:26

I\'ve got a cassandra table with a field of type text named snapshot containing JSON objects:

[identifier, timestamp, snapshot]

I understood th

相关标签:
1条回答
  • 2021-02-04 17:04

    Almost there, you just want to pass your an RDD[String] with your json into the jsonRDD method

    val conf = new SparkConf().setAppName("signal-aggregation")
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc)
    val snapshots = sc.cassandraTable[(String, String, String)]("listener", "snapshots")
    val jsons = snapshots.map(_._3) // Get Third Row Element Json(RDD[String]) 
    val jsonSchemaRDD = sqlContext.jsonRDD(jsons) // Pass in RDD directly
    jsonSchemaRDD.registerTempTable("testjson")
    sqlContext.sql("SELECT * FROM testjson where .... ").collect 
    

    A quick example

    val stringRDD = sc.parallelize(Seq(""" 
      { "isActive": false,
        "balance": "$1,431.73",
        "picture": "http://placehold.it/32x32",
        "age": 35,
        "eyeColor": "blue"
      }""",
       """{
        "isActive": true,
        "balance": "$2,515.60",
        "picture": "http://placehold.it/32x32",
        "age": 34,
        "eyeColor": "blue"
      }""", 
      """{
        "isActive": false,
        "balance": "$3,765.29",
        "picture": "http://placehold.it/32x32",
        "age": 26,
        "eyeColor": "blue"
      }""")
    )
    sqlContext.jsonRDD(stringRDD).registerTempTable("testjson")
    csc.sql("SELECT age from testjson").collect
    //res24: Array[org.apache.spark.sql.Row] = Array([35], [34], [26])
    
    0 讨论(0)
提交回复
热议问题