Spark Group By Key to (Key,List) Pair

后端 未结 2 362
孤街浪徒
孤街浪徒 2021-02-02 00:14

I am trying to group some data by key where the value would be a list:

Sample data:

A 1
A 2
B 1
B 2

Expected result:

(A         


        
相关标签:
2条回答
  • 2021-02-02 00:33

    When you write an anonymous inline function of the form

    ARGS => OPERATION
    

    the entire part before the arrow (=>) is taken as the argument list. So, in the case of

    (k, v) => ...
    

    the interpreter takes that to mean a function that takes two arguments. In your case, however, you have a single argument which happens to be a tuple (here, a Tuple2, or a Pair - more fully, you appear to have a list of Pair[Any,List[Any]]). There are a couple of ways to get around this. First, you can use the sugared form of representing a pair, wrapped in an extra set of parentheses to show that this is the single expected argument for the function:

    ((x, y)) => ...
    

    or, you can write the anonymous function in the form of a partial function that matches on tuples:

    groupedData.map( case (k,v) => (k,v(0)) ) 
    

    Finally, you can simply go with a single specified argument, as per your last attempt, but - realising it is a tuple - reference the specific field(s) within the tuple that you need:

    groupedData.map(s => (s._2(0),s._2(1)))  // The key is s._1, and the value list is s._2   
    
    0 讨论(0)
  • 2021-02-02 00:47

    You're almost there. Just replace List(_) with _.toList

    data.groupByKey.mapValues(_.toList)
    
    0 讨论(0)
提交回复
热议问题