Remove last two characters from column names of all the columns in Dataframe - Pandas

后端 未结 2 1832
花落未央
花落未央 2021-01-19 01:40

I am joining the two dataframes (a,b) with identical columns / column names using the user ID key and while joining, I had to give suffix characters, in order for it to get

相关标签:
2条回答
  • 2021-01-19 02:17

    This snippet should get the job done :

    df.columns = pd.Index(map(lambda x : str(x)[:-2], df.columns))
    

    Edit : This is a better way to do it

    df.rename(columns = lambda x : str(x)[:-2])
    

    In both cases, all we're doing is iterating through the columns and apply some function. In this case, the function converts something into a string and takes everything up until the last two characters.

    I'm sure there are a few other ways you could do this.

    0 讨论(0)
  • 2021-01-19 02:17

    You could use str.rstrip like so

    In [214]: import functools as ft
    
    In [215]: f = ft.partial(np.random.choice, *[5, 3])
    
    In [225]: df = pd.DataFrame({'a': f(), 'b': f(), 'c': f(), 'a_1': f(), 'b_1': f(), 'c_1': f()})
    
    In [226]: df
    Out[226]:
       a  b  c  a_1  b_1  c_1
    0  4  2  0    2    3    2
    1  0  0  3    2    1    1
    2  4  0  4    4    4    3
    
    In [227]: df.columns = df.columns.str.rstrip('_1')
    
    In [228]: df
    Out[228]:
       a  b  c  a  b  c
    0  4  2  0  2  3  2
    1  0  0  3  2  1  1
    2  4  0  4  4  4  3
    

    However if you need something more flexible (albeit probably a bit slower), you can use str.extract which, with the power of regexes, will allow you to select which part of the column name you would like to keep

    In [216]: df = pd.DataFrame({f'{c}_{i}': f() for i in range(3) for c in 'abc'})
    
    In [217]: df
    Out[217]:
       a_0  b_0  c_0  a_1  b_1  c_1  a_2  b_2  c_2
    0    0    1    0    2    2    4    0    0    3
    1    0    0    3    1    4    2    4    3    2
    2    2    0    1    0    0    2    2    2    1
    
    In [223]: df.columns = df.columns.str.extract(r'(.*)_\d+')[0]
    
    In [224]: df
    Out[224]:
    0  a  b  c  a  b  c  a  b  c
    0  1  1  0  0  0  2  1  1  2
    1  1  0  1  0  1  2  0  4  1
    2  1  3  1  3  4  2  0  1  1
    

    Idea to use df.columns.str came from this answer

    0 讨论(0)
提交回复
热议问题