How can I convert RDD
to DataFrame
in Spark Streaming
, not just Spark
?
I saw this example, but it requires
Create sqlContext
outside foreachRDD
,Once you convert the rdd
to DF using sqlContext
, you can write into S3.
For example:
val conf = new SparkConf().setMaster("local").setAppName("My App")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
myDstream.foreachRDD { rdd =>
val df = rdd.toDF()
df.write.format("json").saveAsTextFile("s3://iiiii/ttttt.json")
}
Update:
Even you can create sqlContext
inside foreachRDD
which is going to execute on Driver.
Look at the following answer which contains a scala magic cell inside a python notebook: How to convert Spark Streaming data into Spark DataFrame