PySpark split rows and convert to RDD

前端 未结 1 1835
北海茫月
北海茫月 2021-01-24 15:26

I have an RDD in which each element is having the following format

[\'979500797\', \' 979500797,260973244733,2014-05-0402:05:12,645/01/105/9931,78,645/01/105/993         


        
相关标签:
1条回答
  • 2021-01-24 15:57

    What you need here is a flatMap. flatMap takes function that returns sequence and concatenates the results.

    df_feat3 = df_feat2.flatMap(lambda (x, y): ((x, v) for v in y.split(';')))
    

    On a side note I would avoid using tuple parameters. It is a cool feature but it is no longer available in Python 3. See PEP 3113

    0 讨论(0)
提交回复
热议问题