How to implement 'in' and 'not in' for Pandas dataframe

匿名 (未验证) 提交于 2019-12-03 02:08:02

问题:

How can I achieve the equivalents of SQL's IN and NOT IN?

I have a list with the required values. Here's the scenario:

df = pd.DataFrame({'countries':['US','UK','Germany','China']}) countries = ['UK','China']  # pseudo-code: df[df['countries'] not in countries]

My current way of doing this is as follows:

df = pd.DataFrame({'countries':['US','UK','Germany','China']}) countries = pd.DataFrame({'countries':['UK','China'], 'matched':True})  # IN df.merge(countries,how='inner',on='countries')  # NOT IN not_in = df.merge(countries,how='left',on='countries') not_in = not_in[pd.isnull(not_in['matched'])]

But this seems like a horrible kludge. Can anyone improve on it?

回答1:

You can use pd.Series.isin.

For "IN" use: something.isin(somewhere)

Or for "NOT IN": ~something.isin(somewhere)

As a worked example:

>>> df   countries 0        US 1        UK 2   Germany 3     China >>> countries ['UK', 'China'] >>> df.countries.isin(countries) 0    False 1     True 2    False 3     True Name: countries, dtype: bool >>> df[df.countries.isin(countries)]   countries 1        UK 3     China >>> df[~df.countries.isin(countries)]   countries 0        US 2   Germany


回答2:

I've been usually doing generic filtering over rows like this:

criterion = lambda row: row['countries'] not in countries not_in = df[df.apply(criterion, axis=1)]


回答3:

Alternative solution that uses .query() method:

In [5]: df.query("countries in @countries") Out[5]:   countries 1        UK 3     China  In [6]: df.query("countries not in @countries") Out[6]:   countries 0        US 2   Germany


回答4:

I wanted to filter out dfbc rows that had a BUSINESS_ID that was also in the BUSINESS_ID of dfProfilesBusIds

Finally got it working:

dfbc = dfbc[(dfbc['BUSINESS_ID'].isin(dfProfilesBusIds['BUSINESS_ID']) == False)]


回答5:

df = pd.DataFrame({'countries':['US','UK','Germany','China']}) countries = ['UK','China']

implement in:

df[df.countries.isin(countries)]

implement not in as in of rest countries:

df[df.countries.isin([x for x in np.unique(df.countries) if x not in countries])]


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!