How to return unique words from the text file using Python

后端 未结 9 2117
遇见更好的自我
遇见更好的自我 2021-01-04 23:45

How do I return all the unique words from a text file using Python? For example:

I am not a robot

I am a human

Should return:

相关标签:
9条回答
  • 2021-01-04 23:53

    Simply iterate over the lines in the file and use set to keep only the unique ones.

    from itertools import chain
    
    def unique_words(lines):
        return set(chain(*(line.split() for line in lines if line)))
    

    Then simply do the following to read all unique lines from a file and print them

    with open(filename, 'r') as f:
        print(unique_words(f))
    
    0 讨论(0)
  • 2021-01-05 00:01
    string = "I am not a robot\n I am a human"
    list_str = string.split()
    print list(set(list_str))
    
    0 讨论(0)
  • 2021-01-05 00:01

    Use a set. You don't need to import anything to do this.

    #Open the file
    my_File = open(file_Name, 'r')
    #Read the file
    read_File = my_File.read()
    #Split the words
    words = read_File.split()
    #Using a set will only save the unique words
    unique_words = set(words)
    #You can then print the set as a whole or loop through the set etc
    for word in unique_words:
         print(word)
    
    0 讨论(0)
  • 2021-01-05 00:11

    This seems to be a typical application for a collection:

    ...
    import collections
    d = collections.OrderedDict()
    for word in wordlist: d[word] = None 
    # use this if you also want to count the words:
    # for word in wordlist: d[word] = d.get(word, 0) + 1 
    for k in d.keys(): print k
    

    You could also use a collection.Counter(), which would also count the elements you feed in. The order of the words would get lost though. I added a line for counting and keeping the order.

    0 讨论(0)
  • 2021-01-05 00:11
    try:
        with open("gridlex.txt",mode="r",encoding="utf-8")as india:
    
            for data in india:
                if chr(data)==chr(data):
                    print("no of chrats",len(chr(data)))
                else:
                    print("data")
    except IOError:
        print("sorry")
    
    0 讨论(0)
  • 2021-01-05 00:16

    Using Regex and Set:

    import re
    words = re.findall('\w+', text.lower())
    uniq_words = set(words)
    

    Other way is creating a Dict and inserting the words like keys:

    for i in range(len(doc)):
            frase = doc[i].split(" ")
            for palavra in frase:
                if palavra not in dict_word:
                    dict_word[palavra] = 1
    print dict_word.keys()
    
    0 讨论(0)
提交回复
热议问题