i have this test table in pandas dataframe
Leaf_category_id session_id product_id
0 111 1 987
3 111
We are going to pull out the values from product_id
, create bigrams
that are sorted and thus deduplicated, and count them to get the frequency, and then populate a data frame.
from collections import Counter
# assuming your data frame is called 'df'
bigrams = [list(zip(x,x[1:])) for x in df.product_id.values.tolist()]
bigram_set = [tuple(sorted(xx) for x in bigrams for xx in x]
freq_dict = Counter(bigram_set)
df_freq = pd.DataFrame([list(f) for f in freq_dict], columns=['bigram','freq'])