I have the following python code which almost works for me (I\'m SO close!). I have text file from one Shakespeare\'s plays that I\'m opening: Original text file:
\"Bu
Use plain old lists. Almost certainly not as efficient as Counter
.
fname = raw_input("Enter file name: ")
Words = []
with open(fname) as fhand:
for line in fhand:
line = line.strip()
# lines probably not needed
#if line.startswith('"'):
# line = line[1:]
#if line.endswith('"'):
# line = line[:-1]
Words.extend(line.split())
UniqueWords = []
for word in Words:
if word.lower() not in UniqueWords:
UniqueWords.append(word.lower())
print Words
UniqueWords.sort()
print UniqueWords
This always checks against the lowercase version of the word, to ensure the same word but in a different case configuration is not counted as 2 different words.
I added checks to remove the double quotes at the start and end of the file, but if they are not present in the actual file. These lines could be disregarded.