Conditionally filling blank values in Pandas dataframes

后端 未结 3 1263
误落风尘
误落风尘 2021-01-22 20:41

I have a datafarme which looks like as follows (there are more columns having been dropped off):

    memberID    shipping_country    
    264991      
    264991         


        
相关标签:
3条回答
  • 2021-01-22 21:00

    For the following sample dataframe (I added a memberID group that only contains '' in the shipping_country column):

       memberID shipping_country
    0    264991                 
    1    264991           Canada
    2       100              USA
    3      5000                 
    4      5000               UK
    5        54                 
    

    This should work for you, and also as the behavior that if a memberID group only has empty string values ('') in shipping_country, those will be retained in the output df:

    df['shipping_country'] = df.replace('',np.nan).groupby('memberID')['shipping_country'].transform('first').fillna('')
    

    Yields:

       memberID shipping_country
    0    264991           Canada
    1    264991           Canada
    2       100              USA
    3      5000               UK
    4      5000               UK
    5        54                 
    

    If you would like to leave the empty strings '' as NaN in the output df, then just remove the fillna(''), leaving:

    df['shipping_country'] = df.replace('',np.nan).groupby('memberID')['shipping_country'].transform('first')
    
    0 讨论(0)
  • 2021-01-22 21:07

    You can use chained groupbys, one with forward fill and one with backfill:

    # replace blank values with `NaN` first:
    df['shipping_country'].replace('',pd.np.nan,inplace=True)
    
    df.iloc[::-1].groupby('memberID').ffill().groupby('memberID').bfill()
    
       memberID shipping_country
    0    264991           Canada
    1    264991           Canada
    2       100              USA
    3      5000               UK
    4      5000               UK
    

    This method will also allow a group made up of all NaN to remain NaN:

    >>> df
       memberID shipping_country
    0    264991                 
    1    264991           Canada
    2       100              USA
    3      5000                 
    4      5000               UK
    5         1                 
    6         1                 
    
    df['shipping_country'].replace('',pd.np.nan,inplace=True)
    
    df.iloc[::-1].groupby('memberID').ffill().groupby('memberID').bfill()
    
       memberID shipping_country
    0    264991           Canada
    1    264991           Canada
    2       100              USA
    3      5000               UK
    4      5000               UK
    5         1              NaN
    6         1              NaN
    
    0 讨论(0)
  • 2021-01-22 21:12

    You can use GroupBy + ffill / bfill:

    def filler(x):
        return x.ffill().bfill()
    
    res = df.groupby('memberID')['shipping_country'].apply(filler)
    

    A custom function is necessary as there's no combined Pandas method to ffill and bfill sequentially.

    This also caters for the situation where all values are NaN for a specific memberID; in this case they will remain NaN.

    0 讨论(0)
提交回复
热议问题