How to flatten nested lists in PySpark?

前端 未结 1 1295
旧时难觅i
旧时难觅i 2021-01-12 00:54

I have an RDD structure like:

rdd = [[[1],[2],[3]], [[4],[5]], [[6]], [[7],[8],[9],[10]]]

and I want it to become:

rdd = [         


        
1条回答
  •  说谎
    说谎 (楼主)
    2021-01-12 01:19

    You can for example flatMap and use list comprehensions:

    rdd.flatMap(lambda xs: [x[0] for x in xs])
    

    or to make it a little bit more general:

    from itertools import chain
    
    rdd.flatMap(lambda xs: chain(*xs)).collect()
    

    0 讨论(0)
提交回复
热议问题