Can't set index of a pandas data frame - getting “KeyError”

后端 未结 2 1021
暗喜
暗喜 2021-01-04 21:21

I generate a data frame that looks like this (summaryDF):

   accuracy        f1  precision    recall
0     0.494  0.722433   0.722433  0.722433
         


        
相关标签:
2条回答
  • 2021-01-04 21:38

    I guess you and @jezrael misunderstood an example from the pandas docs:

    df.set_index(['A', 'B'])
    

    A and B are column names / labels in this example:

    In [55]: df = pd.DataFrame(np.random.randint(0, 10, (5,4)), columns=list('ABCD'))
    
    In [56]: df
    Out[56]:
       A  B  C  D
    0  6  9  7  4
    1  5  1  3  4
    2  4  4  0  5
    3  9  0  9  8
    4  6  4  5  7
    
    In [57]: df.set_index(['A','B'])
    Out[57]:
         C  D
    A B
    6 9  7  4
    5 1  3  4
    4 4  0  5
    9 0  9  8
    6 4  5  7
    

    The documentation says it should be list of column labels / arrays.

    so you were looking for:

    In [58]: df.set_index([['A','B','C','D','E']])
    Out[58]:
       A  B  C  D
    A  6  9  7  4
    B  5  1  3  4
    C  4  4  0  5
    D  9  0  9  8
    E  6  4  5  7
    

    but as @jezrael has suggested df.index = ['A','B',...] is faster and more idiomatic method...

    0 讨论(0)
  • 2021-01-04 21:52

    You need assign list to summaryDF.index, if length of list is same as length of DataFrame:

    summaryDF.index = ['A','B','C', 'D','E','F','G','H','I','J','K','L']
    print (summaryDF)
       accuracy        f1  precision    recall
    A     0.494  0.722433   0.722433  0.722433
    B     0.290  0.826087   0.826087  0.826087
    C     0.274  0.629630   0.629630  0.629630
    D     0.278  0.628571   0.628571  0.628571
    E     0.288  0.718750   0.718750  0.718750
    F     0.740  0.740000   0.740000  0.740000
    G     0.698  0.765133   0.765133  0.765133
    H     0.582  0.778547   0.778547  0.778547
    I     0.682  0.748235   0.748235  0.748235
    J     0.574  0.767918   0.767918  0.767918
    K     0.398  0.711656   0.711656  0.711656
    L     0.530  0.780083   0.780083  0.780083
    
    print (summaryDF.index)
    Index(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L'], dtype='object')
    

    Timings:

    In [117]: %timeit summaryDF.index = ['A','B','C', 'D','E','F','G','H','I','J','K','L']
    The slowest run took 6.86 times longer than the fastest. This could mean that an intermediate result is being cached.
    10000 loops, best of 3: 76.2 µs per loop
    
    In [118]: %timeit summaryDF.set_index(pd.Index(['A','B','C', 'D','E','F','G','H','I','J','K','L']))
    The slowest run took 6.77 times longer than the fastest. This could mean that an intermediate result is being cached.
    1000 loops, best of 3: 227 µs per loop
    

    Another solution is convert list to numpy array:

    summaryDF.set_index(np.array(['A','B','C', 'D','E','F','G','H','I','J','K','L']), inplace=True)
    print (summaryDF)
       accuracy        f1  precision    recall
    A     0.494  0.722433   0.722433  0.722433
    B     0.290  0.826087   0.826087  0.826087
    C     0.274  0.629630   0.629630  0.629630
    D     0.278  0.628571   0.628571  0.628571
    E     0.288  0.718750   0.718750  0.718750
    F     0.740  0.740000   0.740000  0.740000
    G     0.698  0.765133   0.765133  0.765133
    H     0.582  0.778547   0.778547  0.778547
    I     0.682  0.748235   0.748235  0.748235
    J     0.574  0.767918   0.767918  0.767918
    K     0.398  0.711656   0.711656  0.711656
    L     0.530  0.780083   0.780083  0.780083
    
    0 讨论(0)
提交回复
热议问题