According to Introducing Spark Datasets:
As we look forward to Spark 2.0, we plan some exciting improvements to Datasets, specifically: ... Custom
For those who may in my situation I put my answer here, too.
To be specific,
I was reading 'Set typed data' from SQLContext. So original data format is DataFrame.
val sample = spark.sqlContext.sql("select 1 as a, collect_set(1) as b limit 1")
sample.show()
+---+---+
| a| b|
+---+---+
| 1|[1]|
+---+---+
Then convert it into RDD using rdd.map() with mutable.WrappedArray type.
sample
.rdd.map(r =>
(r.getInt(0), r.getAs[mutable.WrappedArray[Int]](1).toSet))
.collect()
.foreach(println)
Result:
(1,Set(1))