I am trying to figure out why my groupByKey is returning the following:
[(0, ), (1,
Example:
r1 = sc.parallelize([('a',1),('b',2)]) r2 = sc.parallelize([('b',1),('d',2)]) r1.cogroup(r2).mapValues(lambda x:tuple(reduce(add,__builtin__.map(list,x))))
Result:
[('d', (2,)), ('b', (2, 1)), ('a', (1,))]