Merging dataframes

自作多情 提交于 2019-12-13 03:55:58

问题


I have been struggling with this problem all day. I have two dataframes as follows:

Dataframe 1 - Billboards

Dataframe 2

I would like to merge Dataframe 2 with Dataframe 1 based on song to end up with a dataframe that has SongId, Song, Rank and Year. The problem is that there are some variations in how the Songs are stored. ex: Song in Billboard can be macarena bayside boys mix while Song in Dataframe 2 might be macarena. I wanted to find similarities.


回答1:


I think you would need to calculate the similarity measure between the songs list in df1 and df2. I gave it a try by calculating cosine distance between the songs in df1 and df2 on randomly generated song list.

from sklearn.feature_extraction.text import TfidfVectorizer
vect = TfidfVectorizer(min_df=1)

Song1 = ["macarena bayside boys mix", "cant you hear my heart beat", "crying in the chapell", "you were on my mind"]
Song2 = ["cause im a man", "macarena", "beat from my heart"]

dist_dict = {}
match_dict = {}
for i in Song1 :
    for j in Song2 :
        tfidf = vect.fit_transform([i, j])
        distance = ((tfidf * tfidf.T).A)[0,1]
        if i in dist_dict.keys():
            if dist_dict[i] < distance :
                dist_dict[i] = distance
                match_dict[i] = j
        else :
            dist_dict[i] = distance

Once you have the best match you can lookup the song ID in df2




回答2:


The easiest way to do it: 1. Make "Song" as an index column in both dataframes like

df1.set_index('Song', inplace=True)
df2.set_index('Song', inplace=True)
  1. Use join:

joined = df1.join(df2, how='inner')



来源:https://stackoverflow.com/questions/50570217/merging-dataframes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!