How to display a KeyValueGroupedDataset in Spark?

前端 未结 1 1156
野性不改
野性不改 2021-02-03 11:57

I am trying to learn datasets in Spark. One thing I can\'t figure out is how to display a KeyValueGroupedDataset, as show doesn\'t work for it. Also, w

1条回答
  •  闹比i
    闹比i (楼主)
    2021-02-03 12:57

    OK, I got the idea from examples given here and here. I am giving below a simple example that I've written.

    val x = Seq(("a", 36), ("b", 33), ("c", 40), ("a", 38), ("c", 39)).toDS
    x: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]
    
    val g = x.groupByKey(_._1)
    g: org.apache.spark.sql.KeyValueGroupedDataset[String,(String, Int)] = ...
    
    val z = g.mapGroups{case(k, iter) => (k, iter.map(x => x._2).toArray)}
    z: org.apache.spark.sql.Dataset[(String, Array[Int])] = [_1: string, _2: array]
    
    z.show
    +---+--------+
    | _1|      _2|
    +---+--------+
    |  c|[40, 39]|
    |  b|    [33]|
    |  a|[36, 38]|
    +---+--------+
    

    0 讨论(0)
提交回复
热议问题