Python: Finding the word that shows up the most?

安稳与你 提交于 2019-12-19 11:36:08

问题


I'm trying to get my program to report the word that shows up the most in a text file. For example, if I type "Hello I like pie because they are like so good" the program should print out "like occurred the most." I get this error when executing Option 3: KeyError: 'h'

#Prompt the user to enter a block of text.
done = False
textInput = ""
while(done == False):
    nextInput= input()
    if nextInput== "EOF":
        break
    else:
        textInput += nextInput

#Prompt the user to select an option from the Text Analyzer Menu.
print("Welcome to the Text Analyzer Menu! Select an option by typing a number"
    "\n1. shortest word"
    "\n2. longest word"
    "\n3. most common word"
    "\n4. left-column secret message!"
    "\n5. fifth-words secret message!"
    "\n6. word count"
    "\n7. quit")

#Set option to 0.
option = 0

#Use the 'while' to keep looping until the user types in Option 7.
while option !=7:
    option = int(input())

#The error occurs in this specific section of the code.
#If the user selects Option 3,
    elif option == 3:
        word_counter = {}
        for word in textInput:
            if word in textInput:
                word_counter[word] += 1
            else:
                word_counter[word] = 1

        print("The word that showed up the most was: ", word)

回答1:


I think you may want to do:

for word in textInput.split():
  ...

Currently, you are just iterating through every character in the textInput. So to iterate through every word, we must first split the string up into an array of words. By default .split() splits on whitespace, but you can change this by just passing a delimeter to split().


Also, you need to check if the word is in your dictionary, not in your original string. So try:

if word in word_counter:
  ...

Then, to find the entry with the highest occurrences:

highest_word = ""
highest_value = 0

for k,v in word_counter.items():
  if v > highest_value:
    highest_value = v
    highest_word = k

Then, just print out the value of highest_word and highest_value.


To keep track of ties, just keep a list of the highest words. If we find a higher occurrence, clear the list and continue rebuilding. Here is the full program so far:

textInput = "He likes eating because he likes eating"
word_counter = {}
for word in textInput.split():
  if word in word_counter:
    word_counter[word] += 1
  else:
    word_counter[word] = 1


highest_words = []
highest_value = 0

for k,v in word_counter.items():
  # if we find a new value, create a new list,
  # add the entry and update the highest value
  if v > highest_value:
    highest_words = []
    highest_words.append(k)
    highest_value = v
  # else if the value is the same, add it
  elif v == highest_value:
    highest_words.append(k)

# print out the highest words
for word in highest_words:
  print word



回答2:


Instead of rolling your own counter, a better idea is to use Counters in the collections module.

>>> input = 'blah and stuff and things and stuff'
>>> from collections import Counter
>>> c = Counter(input.split())
>>> c.most_common()
[('and', 3), ('stuff', 2), ('things', 1), ('blah', 1)]

Also, as a general code style thing, please avoid adding comments like this:

#Set option to 0.
option = 0

It makes your code less readable, not more.




回答3:


The original answer is certainly correct, but you may want to keep in mind that it will not show you 'ties for first'. A sentence like

A life in the present is a present itself.

Will only reveal either 'a' or 'present' to be the number one hit. In fact, since dictionaries are (generally) unordered, the result you see may not even be the first word that's repeated multiple times.

If you need to report on multiples, might I suggest the following:

1) Use your current method of key-value pairs for 'word':'hits'.
2) Determine the greatest value for 'hits'.
3) Check for the number of values that equal the greatest number of hits, and add those keys to a list.
4) Iterate through the list to display the words with the greatest number of hits.

Par example:

greatestNumber = 0
# establish the highest number for wordCounter.values()
for hits in wordCounter.values():
    if hits > greatestNumber:
        greatestNumber = hits

topWords = []
#find the keys that are paired to that value and add them to a list
#we COULD just print them as we iterate, but I would argue that this
#makes this function do too much
for word in wordCounter.keys():
    if wordCounter[word] == greatestNumber:
        topWords.append(word)

#now reveal the results
print "The words that showed up the most, with %d hits:" % greatestNumber
for word in topWords:
    print word

Depending on Python 2.7 or Python 3, your mileage (and syntax) may vary. But ideally - IMHO - you'd first want to determine the greatest number of hits and then just go back and add the relevant entries to a new list.

EDIT -- you should probably just go with the Counters module as suggested in a different answer. I didn't even know that was something Python just came prepared to do. Haha don't accept my answer unless you necessarily have to write your own counter! There's already a module for that, it seems.




回答4:


With Python 3.6+ you can use statistics.mode:

>>> from statistics import mode
>>> mode('Hello I like pie because they are like so good'.split())
'like'



回答5:


I'm not too keen on Python, but on your last print statement, shouldn't you have a %s?

i.e.: print("The word that showed up the most was: %s", word)



来源:https://stackoverflow.com/questions/17644975/python-finding-the-word-that-shows-up-the-most

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!