I have two pandas DataFrames df1 and df2 and I want to transform them in order that they keep values only for the index that are common to the 2 dataframes.
df1
I found pd.Index and set combination much faster than numpy.intersect1d as well df1.index.intersection(df2.index). Here is what I used:
df2 = df2.loc[pd.Index(set(df1.index)&set(df2.index))]
Have you tried something like
df1 = df1.loc[[x for x in df1.index if x in df2.index]]
df2 = df2.loc[[x for x in df2.index if x in df1.index]]
In [352]: common = df1.index.intersection(df2.index)
In [353]: df1.loc[common]
Out[353]:
values1
0
28/11/2000 -0.055276
29/11/2000 0.027427
30/11/2000 0.066009
In [354]: df2.loc[common]
Out[354]:
values2
0
28/11/2000 -0.026316
29/11/2000 0.015222
30/11/2000 -0.024480
And, using isin
. intersection
might be faster though.
In [286]: df1.loc[df1.index.isin(df2.index)]
Out[286]:
values1
0
28/11/2000 -0.055276
29/11/2000 0.027427
30/11/2000 0.066009
In [287]: df2.loc[df2.index.isin(df1.index)]
Out[287]:
values2
0
28/11/2000 -0.026316
29/11/2000 0.015222
30/11/2000 -0.024480
You can use Index.intersection + DataFrame.loc:
idx = df1.index.intersection(df2.index)
print (idx)
Index(['28/11/2000', '29/11/2000', '30/11/2000'], dtype='object')
Alternative solution with numpy.intersect1d:
idx = np.intersect1d(df1.index, df2.index)
print (idx)
['28/11/2000' '29/11/2000' '30/11/2000']
df1 = df1.loc[idx]
print (df1)
values 1
28/11/2000 -0.055276
29/11/2000 0.027427
30/11/2000 0.066009
df2 = df2.loc[idx]
reindex
+ dropna
df1.reindex(df2.index).dropna()
Out[21]:
values1
28/11/2000 -0.055276
29/11/2000 0.027427
30/11/2000 0.066009
df2.reindex(df1.index).dropna()
Out[22]:
values2
28/11/2000 -0.026316
29/11/2000 0.015222
30/11/2000 -0.024480