Compare two Dataframes of sentences and return a third one

百般思念 提交于 2020-04-30 06:33:06

问题


I want to compare two long Dataframe columns of sentences, and return a third dataframe that looks like this. a snapshot is shown below.

My first approach was long winded and only worked for single instances, but failed when i applied it to the dataframe. It can be found in a previous question.

The logic is for words in c1 and c2, new value =1, for word in only c1, value set to zero.


sentences = tra_df['Sent1']
context = tra_df['Sent2']
Sent1[0] = "I am completely happy with the plan you have laid out today"
Sent2[0] = 'the plan you have laid out today'
c3 = ['0', '0', '0', '0' , '0', '1', '1', '1', '1', '1', '1'] 


回答1:


According to my understanding of your question, here is the solution.

def get_common_words(c1, c2):
    res = [0]*len(c1.split())
    for idx, existing_word in enumerate(c1.split()):
        if existing_word in c2.split():
            res[idx] = 1
    return res

get_common_words(c1, c2)

If you want to make it work for a pandas dataframe

def get_common_words_df(row):
   c1 = row['Sent1']
   c2 = row['Sent2']
   return get_common_words(c1, c2)


df['sent3'] = df.apply(get_common_words_df, axis=1)

You can optimize it a lot



来源:https://stackoverflow.com/questions/61326907/compare-two-dataframes-of-sentences-and-return-a-third-one

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!