How to apply string methods to multiple columns of a dataframe

前端 未结 3 447
生来不讨喜
生来不讨喜 2021-02-06 02:05

I have a dataframe with multiple string columns. I want to use a string method that is valid for a series on multiple columns of the dataframe. Something like this is what I w

相关标签:
3条回答
  • 2021-02-06 02:11

    You can use a dictionary comprehension and feed to the pd.DataFrame constructor:

    res = pd.DataFrame({col: [x.rstrip('f') for x in df[col]] for col in df})
    

    Currently, the Pandas str methods are inefficient. Regex is even more inefficient, but more easily extendible. As always, you should test with your data.

    # Benchmarking on Python 3.6.0, Pandas 0.19.2
    
    def jez1(df):
        return df.apply(lambda x: x.str.rstrip('f'))
    
    def jez2(df):
        return df.applymap(lambda x: x.rstrip('f'))
    
    def jpp(df):
        return pd.DataFrame({col: [x.rstrip('f') for x in df[col]] for col in df})
    
    def user3483203(df):
        return df.replace(r'f$', '', regex=True)
    
    df = pd.concat([df]*10000)
    
    %timeit jez1(df)         # 33.1 ms per loop
    %timeit jez2(df)         # 29.9 ms per loop
    %timeit jpp(df)          # 13.2 ms per loop
    %timeit user3483203(df)  # 42.9 ms per loop
    
    0 讨论(0)
  • 2021-02-06 02:20

    You can mimic the behavior of rstrip using replace with regex=True, which can be applied to the entire DataFrame:

    df.replace(r'f$', '', regex=True)
    

         A    B
    0  123  789
    1  456  901
    

    Since rstrip takes a sequence of characters to strip, you can easily extend this:

    df.replace(r'[abc]+$', '', regex=True)
    
    0 讨论(0)
  • 2021-02-06 02:21

    Function rstrip working with Series so is possible use apply:

    df = df.apply(lambda x: x.str.rstrip('f'))
    

    Or create Series by stack and last unstack:

    df = df.stack().str.rstrip('f').unstack()
    

    Or use applymap:

    df = df.applymap(lambda x: x.rstrip('f'))
    

    Last if need apply function to some columns:

    #add columns to lists
    cols = ['A']
    df[cols] = df[cols].apply(lambda x: x.str.rstrip('f'))
    df[cols] = df[cols].stack().str.rstrip('f').unstack()
    df[cols] = df[cols].stack().str.rstrip('f').unstack()
    
    0 讨论(0)
提交回复
热议问题