I have a df with two columns and I want to combine both columns ignoring the NaN values. The catch is that sometimes both columns have NaN values in which case I want the new co
fillna
both columns together sum(1)
to add themreplace('', np.nan)
df.fillna('').sum(1).replace('', np.nan)
0 apple-martini
1 apple-pie
2 strawberry-tart
3 dessert
4 NaN
dtype: object
Use fillna on one column with the fill values being the other column:
df['foodstuff'].fillna(df['type'])
The resulting output:
0 apple-martini
1 apple-pie
2 strawberry-tart
3 dessert
4 None
you can use the combine method with a lambda
:
df['foodstuff'].combine(df['type'], lambda a, b: ((a or "") + (b or "")) or None, None)
(a or "")
returns ""
if a is None
then the same logic is applied on the concatenation (where the result would be None
if the concatenation is an empty string).
You can always fill the empty string in the new column with None
import numpy as np
df['new_col'].replace(r'^\s*$', np.nan, regex=True, inplace=True)
Complete code:
import pandas as pd
import numpy as np
df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, None, None], 'type':[None, None, 'strawberry-tart', 'dessert', None]})
df['new_col'] = df['foodstuff'].fillna('') + df['type'].fillna('')
df['new_col'].replace(r'^\s*$', np.nan, regex=True, inplace=True)
df
output:
foodstuff type new_col
0 apple-martini None apple-martini
1 apple-pie None apple-pie
2 None strawberry-tart strawberry-tart
3 None dessert dessert
4 None None NaN
You can replace the non zero values with column names like
df1= df.replace(1, pd.Series(df.columns, df.columns))
Replace 0's with empty string and then merge the columns like below
f = f.replace(0, '') f['new'] = f.First+f.Second+f.Three+f.Four
Refer the full code below.
import pandas as pd
df = pd.DataFrame({'Second':[0,1,0,0],'First':[1,0,0,0],'Three':[0,0,1,0],'Four':[0,0,0,1], 'cl': ['3D', 'Wireless','Accounting','cisco']})
df2=pd.DataFrame({'pi':['Accounting','cisco','3D','Wireless']})
df1= df.replace(1, pd.Series(df.columns, df.columns))
f = pd.merge(df1,df2,how='right',left_on=['cl'],right_on=['pi'])
f = f.replace(0, '')
f['new'] = f.First+f.Second+f.Three+f.Four
df1:
In [3]: df1
Out[3]:
Second First Three Four cl
0 0 First 0 0 3D
1 Second 0 0 0 Wireless
2 0 0 Three 0 Accounting
3 0 0 0 Four cisco
df2:
In [4]: df2
Out[4]:
pi
0 Accounting
1 cisco
2 3D
3 Wireless
Final df will be:
In [2]: f
Out[2]:
Second First Three Four cl pi new
0 First 3D 3D First
1 Second Wireless Wireless Second
2 Three Accounting Accounting Three
3 Four cisco cisco Four
We can make this problem even more complete and have a universal solution for this type of problem.
The key things in there are that we wish to join a group of columns together but just ignore NaN
s.
Here is my answer:
df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, None, None],
'type':[None, None, 'strawberry-tart', 'dessert', None],
'type1':[98324, None, None, 'banan', None],
'type2':[3, None, 'strawberry-tart', np.nan, None]})
df=df.fillna("NAN")
df=df.astype('str')
df["output"] = df[['foodstuff', 'type', 'type1', 'type2']].agg(', '.join, axis=1)
df['output'] = df['output'].str.replace('NAN, ', '')
df['output'] = df['output'].str.replace(', NAN', '')