How do I remove duplicate words from a list in python without using sets?

后端 未结 8 1616
渐次进展
渐次进展 2021-02-11 04:24

I have the following python code which almost works for me (I\'m SO close!). I have text file from one Shakespeare\'s plays that I\'m opening: Original text file:

\"Bu

相关标签:
8条回答
  • 2021-02-11 04:53

    Use plain old lists. Almost certainly not as efficient as Counter.

    fname = raw_input("Enter file name: ")  
    
    Words = []
    with open(fname) as fhand:
        for line in fhand:
            line = line.strip()
            # lines probably not needed
            #if line.startswith('"'):
            #    line = line[1:]
            #if line.endswith('"'):
            #    line = line[:-1]
            Words.extend(line.split())
    
    UniqueWords = []
    for word in Words:
        if word.lower() not in UniqueWords:
            UniqueWords.append(word.lower())
    
    print Words
    UniqueWords.sort()
    print UniqueWords
    

    This always checks against the lowercase version of the word, to ensure the same word but in a different case configuration is not counted as 2 different words.

    I added checks to remove the double quotes at the start and end of the file, but if they are not present in the actual file. These lines could be disregarded.

    0 讨论(0)
  • 2021-02-11 04:59

    A good alternative to using a set would be to use a dictionary. The collections module contains a class called Counter which is specialized dictionary for counting the number of times each of its keys are seen. Using it you could do something like this:

    from collections import Counter
    
    wordlist = ['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and',
                'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is',
                'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun',
                'the', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder']
    
    newlist = sorted(Counter(wordlist), 
                     key=lambda w: w.lower())  # case insensitive sort
    print(newlist)
    

    Output:

    ['already', 'and', 'Arise', 'breaks', 'But', 'east', 'envious', 'fair',
     'grief', 'is', 'It', 'Juliet', 'kill', 'light', 'moon', 'pale', 'sick',
     'soft', 'sun', 'the', 'through', 'what', 'Who', 'window', 'with', 'yonder']
    
    0 讨论(0)
提交回复
热议问题