pandas-python dataframe update a column

后端 未结 3 1412
隐瞒了意图╮
隐瞒了意图╮ 2021-01-21 23:52

Say I have a list BRANDS that contains brand names:

BRANDS = [\'Samsung\', \'Apple\', \'Nike\', .....]

Dataframe A has following structure

相关标签:
3条回答
  • 2021-01-21 23:58

    You can achieve the result you are after by writing a simple function. You can then use .apply() with a lambda function to generate your desired column.

    def contains_any(s, arr):
        for item in arr:
            if item in s: return item
        return np.nan
    df['brand_name'] = df['product'].apply(lambda x: match_substring(x, product_map))
    
    0 讨论(0)
  • 2021-01-22 00:14

    Vectorized solution:

    In [168]: x = df.item_title.str.split(expand=True)
    
    In [169]: df['brand_name'] = \
                  df['brand_name'].fillna(x[x.isin(BRANDS)]
                                             .ffill(axis=1)
                                             .bfill(axis=1)
                                             .iloc[:, 0])
    
    In [170]: df
    Out[170]:
       row     item_title brand_name
    0    1       Apple 6S      Apple
    1    2  Nike BB Shoes       Nike
    2    3     Samsung TV    Samsung
    3    4      Used bike        NaN
    
    0 讨论(0)
  • 2021-01-22 00:19

    One approach is to use apply():

    import pandas as pd
    BRANDS = ['Samsung', 'Apple', 'Nike']
    
    def get_brand_name(row):
        if ~pd.isnull(row['brand_name']):
            # don't do anything if brand_name is not null
            return row['brand_name']
    
        item_title = row['item_title']
        title_words = map(str.title, item_title.split())
        for tw in title_words:
            if tw in BRANDS:
                # return first 'match'
                return tw
        # default return None
        return None
    
    df['brand_name'] = df.apply(lambda x: get_brand_name(x), axis=1)
    print(df)
    #   row     item_title brand_name
    #0    1       Apple 6S      Apple
    #1    2  Nike BB Shoes       Nike
    #2    3     Samsung TV    Samsung
    #3    4      Used bike       None
    

    Notes

    • I converted the tokenized title to title-case using str.title() because that's how you defined BRANDS.
    • If you have a lot of brands, it's recommended to use a set instead of a list because lookups will be faster. However, this won't work if you care about order.
    0 讨论(0)
提交回复
热议问题