How to remove a subset of a data frame in Python?

这一生的挚爱 提交于 2020-08-22 12:12:06

问题


My dataframe df is 3020x4. I'd like to remove a subset df1 20x4 out of the original. In other words, I just want to get the difference whose shape is 3000x4. I tried the below but it did not work. It returned exactly df. Would you please help? Thanks.

new_df = df.drop(df1)

回答1:


As you seem to be unable to post a representative example I will demonstrate one approach using merge with param indicator=True:

So generate some data:

In [116]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df

Out[116]:
          a         b         c
0 -0.134933 -0.664799 -1.611790
1  1.457741  0.652709 -1.154430
2  0.534560 -0.781352  1.978084
3  0.844243 -0.234208 -2.415347
4 -0.118761 -0.287092  1.179237

take a subset:

In [118]:
df_subset=df.iloc[2:3]
df_subset

Out[118]:
         a         b         c
2  0.53456 -0.781352  1.978084

now perform a left merge with param indicator=True this will add _merge column which indicates whether the row is left_only, both or right_only (the latter won't appear in this example) and we filter the merged df to show only left_only:

In [121]:
df_new = df.merge(df_subset, how='left', indicator=True)
df_new = df_new[df_new['_merge'] == 'left_only']
df_new

Out[121]:
          a         b         c     _merge
0 -0.134933 -0.664799 -1.611790  left_only
1  1.457741  0.652709 -1.154430  left_only
3  0.844243 -0.234208 -2.415347  left_only
4 -0.118761 -0.287092  1.179237  left_only

here is the original merged df:

In [122]:
df.merge(df_subset, how='left', indicator=True)

Out[122]:
          a         b         c     _merge
0 -0.134933 -0.664799 -1.611790  left_only
1  1.457741  0.652709 -1.154430  left_only
2  0.534560 -0.781352  1.978084       both
3  0.844243 -0.234208 -2.415347  left_only
4 -0.118761 -0.287092  1.179237  left_only



回答2:


The pandas cheat sheet suggests also the following technique

adf[~adf.x1.isin(bdf.x1)]

where x1 is the column being compared, adf is the dataframe from which the corresponding rows appearing in dataframe bdf are taken out.

The particular question asked by the OP can also be solved by

new_df = df.drop(df1.index)


来源:https://stackoverflow.com/questions/39408109/how-to-remove-a-subset-of-a-data-frame-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!