I\'m trying to remove a group of columns from a dataset. All of the variables to remove end with the text \"prefix\".
I did manage to \"collect\' them into a group
for the sake of completeness:
In [306]: df
Out[306]:
prefixcol1 col2prefix col3prefix colN
0 1 1 1 1
1 2 2 2 2
2 3 3 3 3
In [307]: df.loc[:, ~df.columns.str.contains('prefix$')]
Out[307]:
prefixcol1 colN
0 1 1
1 2 2
2 3 3
or another variant:
In [388]: df.select(lambda x: re.search(r'prefix$', str(x)) is None, axis=1)
Out[388]:
prefixcol1 colN
0 1 1
1 2 2
2 3 3
using filter
and regex
df.filter(regex=r'^((?!prefix).)*$')
df = pd.DataFrame(np.random.rand(2, 6),
columns=['oneprefix', 'one',
'twoprefix', 'two',
'threeprefix', 'three'])
df.filter(regex=r'^((?!prefix).)*$')
where:
df
All are about the same
df2 = df.loc[:, ~df.columns.str.endswith('prefix')]
I think you need:
not_prefix_cols= [col for col in df.columns if not 'prefix' in col]
df2[not_prefix_cols]
But better is use:
prefix_cols= [col for col in df.columns if not col.endswith('prefix')]
print (df[prefix_cols])
Sample:
import pandas as pd
df = pd.DataFrame({'prefixone' : pd.Series([1, 2, 3, 4]),
'twoprefix' : pd.Series([20, 30, 40, 50]),
'two1prefix' : pd.Series([20, 30, 40, 50])})
print (df)
prefixone two1prefix twoprefix
0 1 20 20
1 2 30 30
2 3 40 40
3 4 50 50
prefix_cols= [col for col in df.columns if not col.endswith('prefix')]
print (df[prefix_cols])
prefixone
0 1
1 2
2 3
3 4
df2 = df.drop([col for col in df.columns if 'prefix' in col],axis=1)