How do I return all the unique words from a text file using Python? For example:
I am not a robot
I am a human
Should return:
Simply iterate over the lines in the file and use set to keep only the unique ones.
from itertools import chain
def unique_words(lines):
return set(chain(*(line.split() for line in lines if line)))
Then simply do the following to read all unique lines from a file and print them
with open(filename, 'r') as f:
print(unique_words(f))
string = "I am not a robot\n I am a human"
list_str = string.split()
print list(set(list_str))
Use a set. You don't need to import anything to do this.
#Open the file
my_File = open(file_Name, 'r')
#Read the file
read_File = my_File.read()
#Split the words
words = read_File.split()
#Using a set will only save the unique words
unique_words = set(words)
#You can then print the set as a whole or loop through the set etc
for word in unique_words:
print(word)
This seems to be a typical application for a collection:
...
import collections
d = collections.OrderedDict()
for word in wordlist: d[word] = None
# use this if you also want to count the words:
# for word in wordlist: d[word] = d.get(word, 0) + 1
for k in d.keys(): print k
You could also use a collection.Counter(), which would also count the elements you feed in. The order of the words would get lost though. I added a line for counting and keeping the order.
try:
with open("gridlex.txt",mode="r",encoding="utf-8")as india:
for data in india:
if chr(data)==chr(data):
print("no of chrats",len(chr(data)))
else:
print("data")
except IOError:
print("sorry")
Using Regex and Set:
import re
words = re.findall('\w+', text.lower())
uniq_words = set(words)
Other way is creating a Dict and inserting the words like keys:
for i in range(len(doc)):
frase = doc[i].split(" ")
for palavra in frase:
if palavra not in dict_word:
dict_word[palavra] = 1
print dict_word.keys()