I generate a data frame that looks like this (summaryDF
):
accuracy f1 precision recall
0 0.494 0.722433 0.722433 0.722433
I guess you and @jezrael misunderstood an example from the pandas docs:
df.set_index(['A', 'B'])
A
and B
are column names / labels in this example:
In [55]: df = pd.DataFrame(np.random.randint(0, 10, (5,4)), columns=list('ABCD'))
In [56]: df
Out[56]:
A B C D
0 6 9 7 4
1 5 1 3 4
2 4 4 0 5
3 9 0 9 8
4 6 4 5 7
In [57]: df.set_index(['A','B'])
Out[57]:
C D
A B
6 9 7 4
5 1 3 4
4 4 0 5
9 0 9 8
6 4 5 7
The documentation says it should be list of column labels / arrays.
so you were looking for:
In [58]: df.set_index([['A','B','C','D','E']])
Out[58]:
A B C D
A 6 9 7 4
B 5 1 3 4
C 4 4 0 5
D 9 0 9 8
E 6 4 5 7
but as @jezrael has suggested df.index = ['A','B',...]
is faster and more idiomatic method...
You need assign list
to summaryDF.index
, if length
of list
is same as length
of DataFrame
:
summaryDF.index = ['A','B','C', 'D','E','F','G','H','I','J','K','L']
print (summaryDF)
accuracy f1 precision recall
A 0.494 0.722433 0.722433 0.722433
B 0.290 0.826087 0.826087 0.826087
C 0.274 0.629630 0.629630 0.629630
D 0.278 0.628571 0.628571 0.628571
E 0.288 0.718750 0.718750 0.718750
F 0.740 0.740000 0.740000 0.740000
G 0.698 0.765133 0.765133 0.765133
H 0.582 0.778547 0.778547 0.778547
I 0.682 0.748235 0.748235 0.748235
J 0.574 0.767918 0.767918 0.767918
K 0.398 0.711656 0.711656 0.711656
L 0.530 0.780083 0.780083 0.780083
print (summaryDF.index)
Index(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L'], dtype='object')
Timings:
In [117]: %timeit summaryDF.index = ['A','B','C', 'D','E','F','G','H','I','J','K','L']
The slowest run took 6.86 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 76.2 µs per loop
In [118]: %timeit summaryDF.set_index(pd.Index(['A','B','C', 'D','E','F','G','H','I','J','K','L']))
The slowest run took 6.77 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 227 µs per loop
Another solution is convert list
to numpy array
:
summaryDF.set_index(np.array(['A','B','C', 'D','E','F','G','H','I','J','K','L']), inplace=True)
print (summaryDF)
accuracy f1 precision recall
A 0.494 0.722433 0.722433 0.722433
B 0.290 0.826087 0.826087 0.826087
C 0.274 0.629630 0.629630 0.629630
D 0.278 0.628571 0.628571 0.628571
E 0.288 0.718750 0.718750 0.718750
F 0.740 0.740000 0.740000 0.740000
G 0.698 0.765133 0.765133 0.765133
H 0.582 0.778547 0.778547 0.778547
I 0.682 0.748235 0.748235 0.748235
J 0.574 0.767918 0.767918 0.767918
K 0.398 0.711656 0.711656 0.711656
L 0.530 0.780083 0.780083 0.780083