Remove uni-grams from a list of bi-grams

前端 未结 2 493
北海茫月
北海茫月 2021-01-22 00:26

I have managed to create 2 lists from text documents. The first is my bi-gram list:

keywords = [\'nike shoes\',\'nike clothing\', \'nike black\', \'nike white\']         


        
相关标签:
2条回答
  • 2021-01-22 00:35

    assuming you have the 2 lists this will do what you want:

    new_keywords = []
    
    for k in keywords:
        temp = False
    
        for s in stops:
            if s in k:
               new_keywords.append(k.replace(s,""))
               temp = True
    
        if temp == False:
            new_keywords.append(k)
    

    This will create a list like you posted:

    ['nike shoes', 'nike ', 'nike ', 'nike ']
    

    To eliminate the doubles do this:

    new_keywords = list(set(new_keywords))
    

    So the final list looks like this:

    ['nike shoes', 'nike ']
    

    enter image description here

    0 讨论(0)
  • 2021-01-22 00:51

    You can do it in steps. First define a helper function:

    def removeStop(bigram, stops):
        return ' '.join(w for w in bigram.split() if not w in stops)
    

    And then:

    [removeStop(i,new_stops) for i in new_keywords] 
    
    0 讨论(0)
提交回复
热议问题