pyspark EOFError after calling map

前端 未结 2 1545
一向
一向 2021-02-12 08:34

I am new to spark & pyspark.

I am reading a small csv file (~40k) into a dataframe.

from pyspark.sql import functions as F
df = sqlContext.read.forma         


        
相关标签:
2条回答
  • 2021-02-12 09:09

    Can you please try to do map after converting dataframe into rdd. You are applying map function on a dataframe and then again creating a dataframe from that.Syntax would be like

    df.rdd.map().toDF()
    

    Please let me know if it works. Thanks.

    0 讨论(0)
  • 2021-02-12 09:21

    I believe you are running Spark 2.x and above. Below code should create your dataframe from csv:

    df = spark.read.format("csv").option("header", "true").load("csvfile.csv")
    

    then you can have below code:

    df = df.withColumn('verified', F.when(df['verified'] == 'Y', 1).otherwise(0))
    

    and then you can create df2 without Row and toDF()

    Let me know if this works or if you are using Spark 1.6...thanks.

    0 讨论(0)
提交回复
热议问题