How to skip more then one lines of header in RDD in Spark

后端 未结 3 1612
攒了一身酷
攒了一身酷 2021-01-14 16:15

Data in my first RDD is like

1253
545553
12344896
1 2 1
1 43 2
1 46 1
1 53 2

Now the first 3 integers are some counters that I need to bro

3条回答
  •  悲&欢浪女
    2021-01-14 16:19

    First take the values using take() method as zero323 suggested

    raw  = sc.textfile("file.txt")
    headers = raw.take(3)
    

    Then

    final_raw = raw.filter(lambda x: x != headers)
    

    and done.

提交回复
热议问题