AnalysisException: u"cannot resolve 'name' given input columns: [ list] in sqlContext in spark

前端未结

关注

 3  928

I tried a simple example like:

data = sqlContext.read.format(\"csv\").option(\"header\", \"true\").option(\"inferSchema\", \"true\").load(\"/databricks-datasets/


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  眼角桃花        
                
              
                            
                2021-02-19 09:05
              
            
            
                                                                       
Because header contains spaces or tabs,remove spaces or tabs and try

1) My example script 

from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("Python Spark SQL basic example") \
.config("spark.some.config.option", "some-value") \
.getOrCreate()

df=spark.read.csv(r'test.csv',header=True,sep='^')
print("#################################################################")
print df.printSchema()
df.createOrReplaceTempView("test")
re=spark.sql("select max_seq from test")
print(re.show())
print("################################################################")


2) Input file,here 'max_seq '  contains space so we are getting bellow exception 

Trx_ID^max_seq ^Trx_Type^Trx_Record_Type^Trx_Date

Traceback (most recent call last):
  File "D:/spark-2.1.0-bin-hadoop2.7/bin/test.py", line 14, in <module>
    re=spark.sql("select max_seq from test")
  File "D:\spark-2.1.0-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\sql\session.py", line 541, in sql
  File "D:\spark-2.1.0-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py", line 1133, in __call__
  File "D:\spark-2.1.0-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\sql\utils.py", line 69, in deco
pyspark.sql.utils.AnalysisException: u"cannot resolve '`max_seq`' given input columns: [Venue_City_Name, Trx_Type, Trx_Booking_Status_Committed, Payment_Reference1, Trx_Date, max_seq , Event_ItemVariable_Name, Amount_CurrentPrice, cinema_screen_count, Payment_IsMyPayment, r


2) Remove space after 'max_seq' column then it will work fine

Trx_ID^max_seq^Trx_Type^Trx_Record_Type^Trx_Date


17/03/20 12:16:25 INFO DAGScheduler: Job 3 finished: showString at <unknown>:0, took 0.047602 s
17/03/20 12:16:25 INFO CodeGenerator: Code generated in 8.494073 ms
  max_seq
    10
    23
    22
    22
only showing top 20 rows

None
##############################################################

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  花落未央        
                
              
                            
                2021-02-19 09:13
              
            
            
                                                                       
I found the issue: some of the column names contain white spaces before the name itself. 
So 

data = data.select(" timedelta", " shares").map(lambda r: LabeledPoint(r[1], [r[0]])).toDF()


worked.
I could catch the white spaces using 

assert " " not in ''.join(df.columns)  


Now I am thinking of a way to remove the white spaces. Any idea is much appreciated!
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  甜味超标        
                
              
                            
                2021-02-19 09:28
              
            
            
                                                                       
As there were tabs in my input file, removing the tabs or spaces in the header helped display the answer.

My example:

saledf = spark.read.csv("SalesLTProduct.txt", header=True, inferSchema= True, sep='\t')


saledf.printSchema()

root
|-- ProductID: string (nullable = true)
|-- Name: string (nullable = true)
|-- ProductNumber: string (nullable = true)

saledf.describe('ProductNumber').show()

 +-------+-------------+
 |summary|ProductNumber|
 +-------+-------------+
 |  count|          295|
 |   mean|         null|
 | stddev|         null|
 |    min|      BB-7421|
 |    max|      WB-H098|
 +-------+-------------+

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复