How to select rows that do not start with some str in pandas?

前端未结

关注

 5  1020

I want to select rows that the values do not start with some str. For example, I have a pandas df, and I want to select data do not start with t, a

相关标签:

5条回答

伪装坚强ぢ

2020-12-29 05:15
You can use str.startswith and negate it.
```
    df[~df['col'].str.startswith('t') & 
       ~df['col'].str.startswith('c')]

col
1   mext1
3   okl1
```
Or the better option, with multiple characters in a tuple as per @Ted Petrou:
```
df[~df['col'].str.startswith(('t','c'))]

    col
1   mext1
3   okl1
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
小蘑菇

2020-12-29 05:17
Just another alternative in case you prefer regex:
```
df1[df1.col.str.contains('^[^tc]')]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

一向

2020-12-29 05:27

option 1
use str.match and negative look ahead

df[df.col.str.match('^(?![tc])')]

option 2
within query

df.query('col.str[0] not list("tc")')

option 3
numpy broadcasting

df[(df.col.str[0][:, None] == ['t', 'c']).any(1)]

         col
1  mext1
3   okl1

time testing

def ted(df):
    return df[~df.col.str.get(0).isin(['t', 'c'])]

def adele(df):
    return df[~df['col'].str.startswith(('t','c'))]

def yohanes(df):
    return df[df.col.str.contains('^[^tc]')]

def pir1(df):
    return df[df.col.str.match('^(?![tc])')]

def pir2(df):
    return df.query('col.str[0] not in list("tc")')

def pir3(df):
    df[(df.col.str[0][:, None] == ['t', 'c']).any(1)]

functions = pd.Index(['ted', 'adele', 'yohanes', 'pir1', 'pir2', 'pir3'], name='Method')
lengths = pd.Index([10, 100, 1000, 5000, 10000], name='Length')
results = pd.DataFrame(index=lengths, columns=functions)

from string import ascii_lowercase

for i in lengths:
    a = np.random.choice(list(ascii_lowercase), i)
    df = pd.DataFrame(dict(col=a))
    for j in functions:
        results.set_value(
            i, j,
            timeit(
                '{}(df)'.format(j),
                'from __main__ import df, {}'.format(j),
                number=1000
            )
        )

fig, axes = plt.subplots(3, 1, figsize=(8, 12))
results.plot(ax=axes[0], title='All Methods')
results.drop('pir2', 1).plot(ax=axes[1], title='Drop `pir2`')
results[['ted', 'adele', 'pir3']].plot(ax=axes[2], title='Just the fast ones')
fig.tight_layout()

0 讨论(0)

后悔当初

2020-12-29 05:33
You can use the apply method.

Take your question as a example, the code is like this
```
df[df['col'].apply(lambda x: x[0] not in ['t', 'c'])]
```
I think apply is a more general and flexible method.
0 讨论(0)
发布评论:

提交评论
- 加载中...
后悔当初

2020-12-29 05:36
You can use the str accessor to get string functionality. The get method can grab a given index of the string.
```
df[~df.col.str.get(0).isin(['t', 'c'])]

     col
1  mext1
3   okl1
```
Looks like you can use startswith as well with a tuple (and not a list) of the values you want to exclude.
```
df[~df.col.str.startswith(('t', 'c'))]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...