pandas: combine two columns in a DataFrame

后端 未结 5 482
感情败类
感情败类 2020-11-30 07:42

I have a pandas DataFrame that has multiple columns in it:

Index: 239897 entries, 2012-05-11 15:20:00 to 2012-06-02 23:44:51
Data columns:
foo           


        
相关标签:
5条回答
  • 2020-11-30 08:00

    Try this:

    pandas.concat([df['foo'].dropna(), df['bar'].dropna()]).reindex_like(df)
    

    If you want that data to become the new column bar, just assign the result to df['bar'].

    0 讨论(0)
  • 2020-11-30 08:00

    Another option, use the .apply() method on the frame. You can do reassign a column with deference to existing data...

    import pandas as pd
    import numpy as np
    
    # get your data into a dataframe
    
    # replace content in "bar" with "foo" if "bar" is null
    df["bar"] = df.apply(lambda row: row["foo"] if row["bar"] == np.NaN else row["bar"], axis=1) 
    
    # note: change 'np.NaN' with null values you have like an empty string
    
    0 讨论(0)
  • 2020-11-30 08:06

    More modern pandas versions (since at least 0.12) have the combine_first() and update() methods for DataFrame and Series objects. For example if your DataFrame were called df, you would do:

    df.bar.combine_first(df.foo)
    

    which would only alter Nan values of the bar column to match the foo column, and would do so inplace. To overwrite non-Nan values in bar with those in foo, you would use the update() method.

    0 讨论(0)
  • 2020-11-30 08:08

    you can use directly fillna and assigning the result to the column 'bar'

    df['bar'].fillna(df['foo'], inplace=True)
    del df['foo']
    

    general example:

    import pandas as pd
    #creating the table with two missing values
    df1 = pd.DataFrame({'a':[1,2],'b':[3,4]}, index = [1,2])
    df2 = pd.DataFrame({'b':[5,6]}, index = [3,4])
    dftot = pd.concat((df1, df2))
    print dftot
    #creating the dataframe to fill the missing values
    filldf = pd.DataFrame({'a':[7,7,7,7]})
    
    #filling 
    print dftot.fillna(filldf)
    
    0 讨论(0)
  • 2020-11-30 08:17

    You can do this using numpy too.

    df['bar'] = np.where(pd.isnull(df['bar']),df['foo'],df['bar'])

    0 讨论(0)
提交回复
热议问题