I am joining the two dataframes (a,b) with identical columns / column names using the user ID key and while joining, I had to give suffix characters, in order for it to get
This snippet should get the job done :
df.columns = pd.Index(map(lambda x : str(x)[:-2], df.columns))
Edit : This is a better way to do it
df.rename(columns = lambda x : str(x)[:-2])
In both cases, all we're doing is iterating through the columns and apply some function. In this case, the function converts something into a string and takes everything up until the last two characters.
I'm sure there are a few other ways you could do this.
You could use str.rstrip like so
In [214]: import functools as ft
In [215]: f = ft.partial(np.random.choice, *[5, 3])
In [225]: df = pd.DataFrame({'a': f(), 'b': f(), 'c': f(), 'a_1': f(), 'b_1': f(), 'c_1': f()})
In [226]: df
Out[226]:
a b c a_1 b_1 c_1
0 4 2 0 2 3 2
1 0 0 3 2 1 1
2 4 0 4 4 4 3
In [227]: df.columns = df.columns.str.rstrip('_1')
In [228]: df
Out[228]:
a b c a b c
0 4 2 0 2 3 2
1 0 0 3 2 1 1
2 4 0 4 4 4 3
However if you need something more flexible (albeit probably a bit slower), you can use str.extract which, with the power of regexes, will allow you to select which part of the column name you would like to keep
In [216]: df = pd.DataFrame({f'{c}_{i}': f() for i in range(3) for c in 'abc'})
In [217]: df
Out[217]:
a_0 b_0 c_0 a_1 b_1 c_1 a_2 b_2 c_2
0 0 1 0 2 2 4 0 0 3
1 0 0 3 1 4 2 4 3 2
2 2 0 1 0 0 2 2 2 1
In [223]: df.columns = df.columns.str.extract(r'(.*)_\d+')[0]
In [224]: df
Out[224]:
0 a b c a b c a b c
0 1 1 0 0 0 2 1 1 2
1 1 0 1 0 1 2 0 4 1
2 1 3 1 3 4 2 0 1 1
Idea to use df.columns.str
came from this answer