create a bigram from a column in pandas df

前端 未结 2 1154
你的背包
你的背包 2021-01-15 08:21

i have this test table in pandas dataframe

   Leaf_category_id  session_id  product_id
0               111           1         987
3               111                


        
2条回答
  •  北海茫月
    2021-01-15 09:14

    We are going to pull out the values from product_id, create bigrams that are sorted and thus deduplicated, and count them to get the frequency, and then populate a data frame.

    from collections import Counter
    
    # assuming your data frame is called 'df'
    
    bigrams = [list(zip(x,x[1:])) for x in df.product_id.values.tolist()]
    bigram_set = [tuple(sorted(xx) for x in bigrams for xx in x]
    freq_dict = Counter(bigram_set)
    df_freq = pd.DataFrame([list(f) for f in freq_dict], columns=['bigram','freq'])
    

提交回复
热议问题