问题
I was confused by this, which is very simple but I didn't immediately find the answer on StackOverflow:
df.set_index('xcol')
makes the column'xcol'
become the index (when it is a column of df).df.reindex(myList)
, however, takes indexes from outside the dataframe, for example, from a list namedmyList
that we defined somewhere else.
I hope this post clarifies it! Additions to this post are also welcome!
回答1:
You can see the difference on a simple example. Let's consider this dataframe:
df = pd.DataFrame({'a': [1, 2],'b': [3, 4]})
print (df)
a b
0 1 3
1 2 4
Indexes are then 0 and 1
If you use set_index
with the column 'a' then the indexes are 1 and 2. If you do df.set_index('a').loc[1,'b']
, you will get 3.
Now if you want to use reindex
with the same indexes 1 and 2 such as df.reindex([1,2])
, you will get 4.0 when you do df.reindex([1,2]).loc[1,'b']
What happend is that set_index
has replaced the previous indexes (0,1) with (1,2) (values from column 'a') without touching the order of values in the column 'b'
df.set_index('a')
b
a
1 3
2 4
while reindex
change the indexes but keeps the values in column 'b' associated to the indexes in the original df
df.reindex(df.a.values).drop('a',1) # equivalent to df.reindex(df.a.values).drop('a',1)
b
1 4.0
2 NaN
# drop('a',1) is just to not care about column a in my example
Finally, reindex
change the order of indexes without changing the values of the row associated to each index, while set_index
will change the indexes with the values of a column, without touching the order of the other values in the dataframe
回答2:
Just to add, the undo to set_index would be reset_index method (more or less):
df = pd.DataFrame({'a': [1, 2],'b': [3, 4]})
print (df)
df.set_index('a', inplace=True)
print(df)
df.reset_index(inplace=True, drop=False)
print(df)
a b
0 1 3
1 2 4
b
a
1 3
2 4
a b
0 1 3
1 2 4
回答3:
Besides great answer from Ben. T, I would like to give one more example of how they are different when you use reindex
and set_index
to an index column
import pandas as pd
import numpy as np
testdf = pd.DataFrame({'a': [1, 3, 2],'b': [3, 5, 4],'c': [5, 7, 6]})
print(testdf)
print(testdf.set_index(np.random.permutation(testdf.index)))
print(testdf.reindex(np.random.permutation(testdf.index)))
Output:
- With
set_index
, whenindex
column (the first column) is shuffled, the order of other columns are kept intact - With
reindex
, the order of rows are changed accordingly to the shuffle ofindex
column.
a b c
0 1 3 5
1 3 5 7
2 2 4 6
a b c
1 1 3 5
2 3 5 7
0 2 4 6
a b c
2 2 4 6
1 3 5 7
0 1 3 5
来源:https://stackoverflow.com/questions/50741330/difference-between-df-reindex-and-df-set-index-methods-in-pandas