可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
How can I achieve the equivalents of SQL's IN
and NOT IN
?
I have a list with the required values. Here's the scenario:
df = pd.DataFrame({'countries':['US','UK','Germany','China']}) countries = ['UK','China'] # pseudo-code: df[df['countries'] not in countries]
My current way of doing this is as follows:
df = pd.DataFrame({'countries':['US','UK','Germany','China']}) countries = pd.DataFrame({'countries':['UK','China'], 'matched':True}) # IN df.merge(countries,how='inner',on='countries') # NOT IN not_in = df.merge(countries,how='left',on='countries') not_in = not_in[pd.isnull(not_in['matched'])]
But this seems like a horrible kludge. Can anyone improve on it?
回答1:
You can use pd.Series.isin
.
For "IN" use: something.isin(somewhere)
Or for "NOT IN": ~something.isin(somewhere)
As a worked example:
>>> df countries 0 US 1 UK 2 Germany 3 China >>> countries ['UK', 'China'] >>> df.countries.isin(countries) 0 False 1 True 2 False 3 True Name: countries, dtype: bool >>> df[df.countries.isin(countries)] countries 1 UK 3 China >>> df[~df.countries.isin(countries)] countries 0 US 2 Germany
回答2:
I've been usually doing generic filtering over rows like this:
criterion = lambda row: row['countries'] not in countries not_in = df[df.apply(criterion, axis=1)]
回答3:
Alternative solution that uses .query() method:
In [5]: df.query("countries in @countries") Out[5]: countries 1 UK 3 China In [6]: df.query("countries not in @countries") Out[6]: countries 0 US 2 Germany
回答4:
I wanted to filter out dfbc rows that had a BUSINESS_ID that was also in the BUSINESS_ID of dfProfilesBusIds
Finally got it working:
dfbc = dfbc[(dfbc['BUSINESS_ID'].isin(dfProfilesBusIds['BUSINESS_ID']) == False)]
回答5:
df = pd.DataFrame({'countries':['US','UK','Germany','China']}) countries = ['UK','China']
implement in:
df[df.countries.isin(countries)]
implement not in as in of rest countries:
df[df.countries.isin([x for x in np.unique(df.countries) if x not in countries])]