问题
My dataframe df is 3020x4. I'd like to remove a subset df1 20x4 out of the original. In other words, I just want to get the difference whose shape is 3000x4. I tried the below but it did not work. It returned exactly df. Would you please help? Thanks.
new_df = df.drop(df1)
回答1:
As you seem to be unable to post a representative example I will demonstrate one approach using merge
with param indicator=True
:
So generate some data:
In [116]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df
Out[116]:
a b c
0 -0.134933 -0.664799 -1.611790
1 1.457741 0.652709 -1.154430
2 0.534560 -0.781352 1.978084
3 0.844243 -0.234208 -2.415347
4 -0.118761 -0.287092 1.179237
take a subset:
In [118]:
df_subset=df.iloc[2:3]
df_subset
Out[118]:
a b c
2 0.53456 -0.781352 1.978084
now perform a left merge
with param indicator=True
this will add _merge
column which indicates whether the row is left_only
, both
or right_only
(the latter won't appear in this example) and we filter the merged df to show only left_only
:
In [121]:
df_new = df.merge(df_subset, how='left', indicator=True)
df_new = df_new[df_new['_merge'] == 'left_only']
df_new
Out[121]:
a b c _merge
0 -0.134933 -0.664799 -1.611790 left_only
1 1.457741 0.652709 -1.154430 left_only
3 0.844243 -0.234208 -2.415347 left_only
4 -0.118761 -0.287092 1.179237 left_only
here is the original merged df:
In [122]:
df.merge(df_subset, how='left', indicator=True)
Out[122]:
a b c _merge
0 -0.134933 -0.664799 -1.611790 left_only
1 1.457741 0.652709 -1.154430 left_only
2 0.534560 -0.781352 1.978084 both
3 0.844243 -0.234208 -2.415347 left_only
4 -0.118761 -0.287092 1.179237 left_only
回答2:
The pandas cheat sheet suggests also the following technique
adf[~adf.x1.isin(bdf.x1)]
where x1 is the column being compared, adf is the dataframe from which the corresponding rows appearing in dataframe bdf are taken out.
The particular question asked by the OP can also be solved by
new_df = df.drop(df1.index)
来源:https://stackoverflow.com/questions/39408109/how-to-remove-a-subset-of-a-data-frame-in-python