pandas get rows which are NOT in other dataframe

后端未结

关注

 13  871

春和景丽

I\'ve two pandas data frames which have some rows in common.

Suppose dataframe2 is a subset of dataframe1.

How can I get the rows of dataframe1 which

相关标签:

13条回答

猫巷女王i

2020-11-22 02:48

Suppose you have two dataframes, df_1 and df_2 having multiple fields(column_names) and you want to find the only those entries in df_1 that are not in df_2 on the basis of some fields(e.g. fields_x, fields_y), follow the following steps.

Step1.Add a column key1 and key2 to df_1 and df_2 respectively.

Step2.Merge the dataframes as shown below. field_x and field_y are our desired columns.

Step3.Select only those rows from df_1 where key1 is not equal to key2.

Step4.Drop key1 and key2.

This method will solve your problem and works fast even with big data sets. I have tried it for dataframes with more than 1,000,000 rows.

df_1['key1'] = 1 df_2['key2'] = 1 df_1 = pd.merge(df_1, df_2, on=['field_x', 'field_y'], how = 'left') df_1 = df_1[~(df_1.key2 == df_1.key1)] df_1 = df_1.drop(['key1','key2'], axis=1)

0 讨论(0)

发布评论:

提交评论

加载中...

[愿得一人]

2020-11-22 02:48

Here is another way of solving this:

df1[~df1.index.isin(df1.merge(df2, how='inner', on=['col1', 'col2']).index)]

Or:

df1.loc[df1.index.difference(df1.merge(df2, how='inner', on=['col1', 'col2']).index)]

0 讨论(0)

发布评论:

提交评论

加载中...

孤独总比滥情好

2020-11-22 02:48

My way of doing this involves adding a new column that is unique to one dataframe and using this to choose whether to keep an entry

df2[col3] = 1 df1 = pd.merge(df_1, df_2, on=['field_x', 'field_y'], how = 'outer') df1['Empt'].fillna(0, inplace=True)

This makes it so every entry in df1 has a code - 0 if it is unique to df1, 1 if it is in both dataFrames. You then use this to restrict to what you want

answer = nonuni[nonuni['Empt'] == 0]

0 讨论(0)

发布评论:

提交评论

加载中...

星月不相逢

2020-11-22 02:50

a bit late, but it might be worth checking the "indicator" parameter of pd.merge.

See this other question for an example: Compare PandaS DataFrames and return rows that are missing from the first one

0 讨论(0)

发布评论:

提交评论

加载中...

执笔经年

2020-11-22 02:52

How about this:

df1 = pandas.DataFrame(data = {'col1' : [1, 2, 3, 4, 5], 'col2' : [10, 11, 12, 13, 14]}) df2 = pandas.DataFrame(data = {'col1' : [1, 2, 3], 'col2' : [10, 11, 12]}) records_df2 = set([tuple(row) for row in df2.values]) in_df2_mask = np.array([tuple(row) in records_df2 for row in df1.values]) result = df1[~in_df2_mask]

0 讨论(0)

发布评论:

提交评论

加载中...

独厮守ぢ

2020-11-22 02:52

extract the dissimilar rows using the merge function
df = df.merge(same.drop_duplicates(), on=['col1','col2'], how='left', indicator=True)
save the dissimilar rows in CSV
df[df['_merge'] == 'left_only'].to_csv('output.csv')

0 讨论(0)

发布评论:

提交评论

加载中...

1 2 3 下一页

验证码

看不清?

提交回复