Can someone please share how one can convert a dataframe
to an RDD
?
Simply:
val rows: RDD[Row] = df.rdd
I was just looking for my answer and found this post.
Jean's answer to absolutely correct,adding on that "df.rdd" will return a RDD[Rows]. I need to apply split() once i get RDD. For that we need to convert RDD[Row} to RDD[String]
val opt=spark.sql("select tags from cvs").map(x=>x.toString()).rdd
Use df.map(row => ...)
to convert the dataframe to a RDD if you want to map a row to a different RDD element. For example
df.map(row => (row(1), row(2)))
gives you a paired RDD where the first column of the df is the key and the second column of the df is the value.