Can't set index of a pandas data frame - getting “KeyError”

后端未结

关注

 2  1021

I generate a data frame that looks like this (summaryDF):

   accuracy        f1  precision    recall
0     0.494  0.722433   0.722433  0.722433


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  陌清茗        
                
              
                            
                2021-01-04 21:38
              
            
            
                                                                       
I guess you and @jezrael misunderstood an example from the pandas docs:

df.set_index(['A', 'B'])


A and B are column names / labels in this example:

In [55]: df = pd.DataFrame(np.random.randint(0, 10, (5,4)), columns=list('ABCD'))

In [56]: df
Out[56]:
   A  B  C  D
0  6  9  7  4
1  5  1  3  4
2  4  4  0  5
3  9  0  9  8
4  6  4  5  7

In [57]: df.set_index(['A','B'])
Out[57]:
     C  D
A B
6 9  7  4
5 1  3  4
4 4  0  5
9 0  9  8
6 4  5  7


The documentation says it should be list of column labels / arrays.

so you were looking for:

In [58]: df.set_index([['A','B','C','D','E']])
Out[58]:
   A  B  C  D
A  6  9  7  4
B  5  1  3  4
C  4  4  0  5
D  9  0  9  8
E  6  4  5  7


but as @jezrael has suggested df.index = ['A','B',...] is faster and more idiomatic method...
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  北海茫月        
                
              
                            
                2021-01-04 21:52
              
            
            
                                                                       
You need assign list to summaryDF.index, if length of list is same as length of DataFrame:

summaryDF.index = ['A','B','C', 'D','E','F','G','H','I','J','K','L']
print (summaryDF)
   accuracy        f1  precision    recall
A     0.494  0.722433   0.722433  0.722433
B     0.290  0.826087   0.826087  0.826087
C     0.274  0.629630   0.629630  0.629630
D     0.278  0.628571   0.628571  0.628571
E     0.288  0.718750   0.718750  0.718750
F     0.740  0.740000   0.740000  0.740000
G     0.698  0.765133   0.765133  0.765133
H     0.582  0.778547   0.778547  0.778547
I     0.682  0.748235   0.748235  0.748235
J     0.574  0.767918   0.767918  0.767918
K     0.398  0.711656   0.711656  0.711656
L     0.530  0.780083   0.780083  0.780083

print (summaryDF.index)
Index(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L'], dtype='object')


Timings:

In [117]: %timeit summaryDF.index = ['A','B','C', 'D','E','F','G','H','I','J','K','L']
The slowest run took 6.86 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 76.2 µs per loop

In [118]: %timeit summaryDF.set_index(pd.Index(['A','B','C', 'D','E','F','G','H','I','J','K','L']))
The slowest run took 6.77 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 227 µs per loop


Another solution is convert list to numpy array:

summaryDF.set_index(np.array(['A','B','C', 'D','E','F','G','H','I','J','K','L']), inplace=True)
print (summaryDF)
   accuracy        f1  precision    recall
A     0.494  0.722433   0.722433  0.722433
B     0.290  0.826087   0.826087  0.826087
C     0.274  0.629630   0.629630  0.629630
D     0.278  0.628571   0.628571  0.628571
E     0.288  0.718750   0.718750  0.718750
F     0.740  0.740000   0.740000  0.740000
G     0.698  0.765133   0.765133  0.765133
H     0.582  0.778547   0.778547  0.778547
I     0.682  0.748235   0.748235  0.748235
J     0.574  0.767918   0.767918  0.767918
K     0.398  0.711656   0.711656  0.711656
L     0.530  0.780083   0.780083  0.780083

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复