Populating Pandas DataFrame column based on dictionary of regex

后端 未结 2 867
清歌不尽
清歌不尽 2021-01-20 13:43

I have a dataframe like the following:

    GE    GO
1   AD    Weiss
2   KI    Ruby
3   OH    Port
4   ER    Rose
5   KI    Rose
6   JJ    Weiss
7   OH    7UP         


        
2条回答
  •  礼貌的吻别
    2021-01-20 14:06

    You can do it this way:

    In [253]: df['OUT'] = df[['GO']].replace({'GO':Dic}, regex=True)
    
    In [254]: df
    Out[254]:
        GE     GO   OUT
    1   AD  Weiss  Beer
    2   KI   Ruby  Beer
    3   OH   Port  Wine
    4   ER   Rose  Wine
    5   KI   Rose  Wine
    6   JJ  Weiss  Beer
    7   OH    7UP  Soda
    8   AD    7UP  Soda
    9   OP   Coke  Soda
    10  JJ  Stout  Beer
    

    Intereseting observation - in older Pandas versions, Series.map() method was almost always faster compared to DataFrame.replace() and Series.str.replace() methods. It got better in Pandas 0.19.2:

    In [267]: df = pd.concat([df] * 10**4, ignore_index=True)
    
    In [268]: %timeit df.GO.map(lambda x: next(Dic[k] for k in Dic if re.search(k, x)))
    1 loop, best of 3: 1.57 s per loop
    
    In [269]: %timeit df[['GO']].replace({'GO':Dic}, regex=True)
    1 loop, best of 3: 895 ms per loop
    
    In [270]: %timeit df.GO.replace(Dic, regex=True)
    1 loop, best of 3: 876 ms per loop
    
    In [271]: df.shape
    Out[271]: (100000, 2)
    

提交回复
热议问题