pandas combine two columns with null values

后端 未结 6 1951
醉梦人生
醉梦人生 2021-02-03 22:10

I have a df with two columns and I want to combine both columns ignoring the NaN values. The catch is that sometimes both columns have NaN values in which case I want the new co

相关标签:
6条回答
  • 2021-02-03 22:32
    • fillna both columns together
    • sum(1) to add them
    • replace('', np.nan)

    df.fillna('').sum(1).replace('', np.nan)
    
    0      apple-martini
    1          apple-pie
    2    strawberry-tart
    3            dessert
    4                NaN
    dtype: object
    
    0 讨论(0)
  • 2021-02-03 22:39

    Use fillna on one column with the fill values being the other column:

    df['foodstuff'].fillna(df['type'])
    

    The resulting output:

    0      apple-martini
    1          apple-pie
    2    strawberry-tart
    3            dessert
    4               None
    
    0 讨论(0)
  • 2021-02-03 22:40

    you can use the combine method with a lambda:

    df['foodstuff'].combine(df['type'], lambda a, b: ((a or "") + (b or "")) or None, None)
    

    (a or "") returns "" if a is None then the same logic is applied on the concatenation (where the result would be None if the concatenation is an empty string).

    0 讨论(0)
  • 2021-02-03 22:44

    You can always fill the empty string in the new column with None

    import numpy as np
    
    df['new_col'].replace(r'^\s*$', np.nan, regex=True, inplace=True)
    

    Complete code:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, None, None], 'type':[None, None, 'strawberry-tart', 'dessert', None]})
    
    df['new_col'] = df['foodstuff'].fillna('') + df['type'].fillna('')
    
    df['new_col'].replace(r'^\s*$', np.nan, regex=True, inplace=True)
    
    df
    

    output:

        foodstuff   type    new_col
    0   apple-martini   None    apple-martini
    1   apple-pie   None    apple-pie
    2   None    strawberry-tart strawberry-tart
    3   None    dessert dessert
    4   None    None    NaN
    
    0 讨论(0)
  • 2021-02-03 22:46
    1. You can replace the non zero values with column names like

      df1= df.replace(1, pd.Series(df.columns, df.columns))

    2. Replace 0's with empty string and then merge the columns like below

      f = f.replace(0, '') f['new'] = f.First+f.Second+f.Three+f.Four

    Refer the full code below.

    import pandas as pd
    df = pd.DataFrame({'Second':[0,1,0,0],'First':[1,0,0,0],'Three':[0,0,1,0],'Four':[0,0,0,1], 'cl': ['3D', 'Wireless','Accounting','cisco']})
    df2=pd.DataFrame({'pi':['Accounting','cisco','3D','Wireless']})
    df1= df.replace(1, pd.Series(df.columns, df.columns))
    f = pd.merge(df1,df2,how='right',left_on=['cl'],right_on=['pi'])
    f = f.replace(0, '')
    f['new'] = f.First+f.Second+f.Three+f.Four
    

    df1:

    In [3]: df1                                                                                                                                                                              
    Out[3]: 
       Second  First  Three  Four          cl
    0       0  First      0     0          3D
    1  Second      0      0     0    Wireless
    2       0      0  Three     0  Accounting
    3       0      0      0  Four       cisco
    

    df2:

    In [4]: df2                                                                                                                                                                              
    Out[4]: 
               pi
    0  Accounting
    1       cisco
    2          3D
    3    Wireless
    

    Final df will be:

    In [2]: f                                                                                                                                                                                
    Out[2]: 
       Second  First  Three  Four          cl          pi     new
    0          First                       3D          3D   First
    1  Second                        Wireless    Wireless  Second
    2                 Three        Accounting  Accounting   Three
    3                        Four       cisco       cisco    Four
    
    0 讨论(0)
  • We can make this problem even more complete and have a universal solution for this type of problem.

    The key things in there are that we wish to join a group of columns together but just ignore NaNs.

    Here is my answer:

    df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, None, None], 
                   'type':[None, None, 'strawberry-tart', 'dessert', None],
                  'type1':[98324, None, None, 'banan', None],
                  'type2':[3, None, 'strawberry-tart', np.nan, None]})
    

    df=df.fillna("NAN")
    df=df.astype('str')
    df["output"] = df[['foodstuff', 'type', 'type1', 'type2']].agg(', '.join, axis=1)
    df['output'] = df['output'].str.replace('NAN, ', '')
    df['output'] = df['output'].str.replace(', NAN', '')
    

    0 讨论(0)
提交回复
热议问题