pyspark EOFError after calling map

前端未结

关注

 2  1545

I am new to spark & pyspark.

I am reading a small csv file (~40k) into a dataframe.

from pyspark.sql import functions as F
df = sqlContext.read.forma


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  鱼传尺愫        
                
              
                            
                2021-02-12 09:09
              
            
            
                                                                       
Can you please try to do map after converting dataframe into rdd. You are applying map function on a dataframe and then again creating a dataframe from that.Syntax would be like

df.rdd.map().toDF()


Please let me know if it works. Thanks. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  长发绾君心        
                
              
                            
                2021-02-12 09:21
              
            
            
                                                                       
I believe you are running Spark 2.x and above. Below code should create your dataframe from csv:

df = spark.read.format("csv").option("header", "true").load("csvfile.csv")


then you can have below code:

df = df.withColumn('verified', F.when(df['verified'] == 'Y', 1).otherwise(0))


and then you can create df2 without Row and toDF()

Let me know if this works or if you are using Spark 1.6...thanks.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复