How to select rows that do not start with some str in pandas?

前端 未结 5 1014
别跟我提以往
别跟我提以往 2020-12-29 04:34

I want to select rows that the values do not start with some str. For example, I have a pandas df, and I want to select data do not start with t, a

相关标签:
5条回答
  • 2020-12-29 05:15

    You can use str.startswith and negate it.

        df[~df['col'].str.startswith('t') & 
           ~df['col'].str.startswith('c')]
    
    col
    1   mext1
    3   okl1
    

    Or the better option, with multiple characters in a tuple as per @Ted Petrou:

    df[~df['col'].str.startswith(('t','c'))]
    
        col
    1   mext1
    3   okl1
    
    0 讨论(0)
  • 2020-12-29 05:17

    Just another alternative in case you prefer regex:

    df1[df1.col.str.contains('^[^tc]')]
    
    0 讨论(0)
  • 2020-12-29 05:27

    option 1
    use str.match and negative look ahead

    df[df.col.str.match('^(?![tc])')]
    

    option 2
    within query

    df.query('col.str[0] not list("tc")')
    

    option 3
    numpy broadcasting

    df[(df.col.str[0][:, None] == ['t', 'c']).any(1)]
    

             col
    1  mext1
    3   okl1
    

    time testing

    def ted(df):
        return df[~df.col.str.get(0).isin(['t', 'c'])]
    
    def adele(df):
        return df[~df['col'].str.startswith(('t','c'))]
    
    def yohanes(df):
        return df[df.col.str.contains('^[^tc]')]
    
    def pir1(df):
        return df[df.col.str.match('^(?![tc])')]
    
    def pir2(df):
        return df.query('col.str[0] not in list("tc")')
    
    def pir3(df):
        df[(df.col.str[0][:, None] == ['t', 'c']).any(1)]
    
    functions = pd.Index(['ted', 'adele', 'yohanes', 'pir1', 'pir2', 'pir3'], name='Method')
    lengths = pd.Index([10, 100, 1000, 5000, 10000], name='Length')
    results = pd.DataFrame(index=lengths, columns=functions)
    
    from string import ascii_lowercase
    
    for i in lengths:
        a = np.random.choice(list(ascii_lowercase), i)
        df = pd.DataFrame(dict(col=a))
        for j in functions:
            results.set_value(
                i, j,
                timeit(
                    '{}(df)'.format(j),
                    'from __main__ import df, {}'.format(j),
                    number=1000
                )
            )
    
    fig, axes = plt.subplots(3, 1, figsize=(8, 12))
    results.plot(ax=axes[0], title='All Methods')
    results.drop('pir2', 1).plot(ax=axes[1], title='Drop `pir2`')
    results[['ted', 'adele', 'pir3']].plot(ax=axes[2], title='Just the fast ones')
    fig.tight_layout()
    

    0 讨论(0)
  • 2020-12-29 05:33

    You can use the apply method.

    Take your question as a example, the code is like this

    df[df['col'].apply(lambda x: x[0] not in ['t', 'c'])]
    

    I think apply is a more general and flexible method.

    0 讨论(0)
  • 2020-12-29 05:36

    You can use the str accessor to get string functionality. The get method can grab a given index of the string.

    df[~df.col.str.get(0).isin(['t', 'c'])]
    
         col
    1  mext1
    3   okl1
    

    Looks like you can use startswith as well with a tuple (and not a list) of the values you want to exclude.

    df[~df.col.str.startswith(('t', 'c'))]
    
    0 讨论(0)
提交回复
热议问题