问题
I am using Python 3.3
I need to create two lists, one for the unique words and the other for the frequencies of the word.
I have to sort the unique word list based on the frequencies list so that the word with the highest frequency is first in the list.
I have the design in text but am uncertain how to implement it in Python.
The methods I have found so far use either Counter
or dictionaries which we have not learned. I have already created the list from the file containing all the words but do not know how to find the frequency of each word in the list. I know I will need a loop to do this but cannot figure it out.
Here's the basic design:
original list = ["the", "car",....]
newlst = []
frequency = []
for word in the original list
if word not in newlst:
newlst.append(word)
set frequency = 1
else
increase the frequency
sort newlst based on frequency list
回答1:
use this
from collections import Counter
list1=['apple','egg','apple','banana','egg','apple']
counts = Counter(list1)
print(counts)
# Counter({'apple': 3, 'egg': 2, 'banana': 1})
回答2:
You can use
from collections import Counter
It supports Python 2.7,read more information here
1.
>>>c = Counter('abracadabra')
>>>c.most_common(3)
[('a', 5), ('r', 2), ('b', 2)]
use dict
>>>d={1:'one', 2:'one', 3:'two'}
>>>c = Counter(d.values())
[('one', 2), ('two', 1)]
But, You have to read the file first, and converted to dict.
2. it's the python docs example,use re and Counter
# Find the ten most common words in Hamlet
>>> import re
>>> words = re.findall(r'\w+', open('hamlet.txt').read().lower())
>>> Counter(words).most_common(10)
[('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631),
('you', 554), ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)]
回答3:
words = file("test.txt", "r").read().split() #read the words into a list.
uniqWords = sorted(set(words)) #remove duplicate words and sort
for word in uniqWords:
print words.count(word), word
回答4:
You can use reduce() - A functional way.
words = "apple banana apple strawberry banana lemon"
reduce( lambda d, c: d.update([(c, d.get(c,0)+1)]) or d, words.split(), {})
returns:
{'strawberry': 1, 'lemon': 1, 'apple': 2, 'banana': 2}
回答5:
Yet another solution with another algorithm without using collections:
def countWords(A):
dic={}
for x in A:
if not x in dic: #Python 2.7: if not dic.has_key(x):
dic[x] = A.count(x)
return dic
dic = countWords(['apple','egg','apple','banana','egg','apple'])
sorted_items=sorted(dic.items()) # if you want it sorted
回答6:
One way would be to make a list of lists, with each sub-list in the new list containing a word and a count:
list1 = [] #this is your original list of words
list2 = [] #this is a new list
for word in list1:
if word in list2:
list2.index(word)[1] += 1
else:
list2.append([word,0])
Or, more efficiently:
for word in list1:
try:
list2.index(word)[1] += 1
except:
list2.append([word,0])
This would be less efficient than using a dictionary, but it uses more basic concepts.
回答7:
Using Counter would be the best way, but if you don't want to do that, you can implement it yourself this way.
# The list you already have
word_list = ['words', ..., 'other', 'words']
# Get a set of unique words from the list
word_set = set(word_list)
# create your frequency dictionary
freq = {}
# iterate through them, once per unique word.
for word in word_set:
freq[word] = word_list.count(word) / float(len(word_list))
freq will end up with the frequency of each word in the list you already have.
You need float
in there to convert one of the integers to a float, so the resulting value will be a float.
Edit:
If you can't use a dict or set, here is another less efficient way:
# The list you already have
word_list = ['words', ..., 'other', 'words']
unique_words = []
for word in word_list:
if word not in unique_words:
unique_words += [word]
word_frequencies = []
for word in unique_words:
word_frequencies += [float(word_list.count(word)) / len(word_list)]
for i in range(len(unique_words)):
print(unique_words[i] + ": " + word_frequencies[i])
The indicies of unique_words
and word_frequencies
will match.
回答8:
The ideal way is to use a dictionary that maps a word to it's count. But if you can't use that, you might want to use 2 lists - 1 storing the words, and the other one storing counts of words. Note that order of words and counts matters here. Implementing this would be hard and not very efficient.
回答9:
Pandas answer:
import pandas as pd
original_list = ["the", "car", "is", "red", "red", "red", "yes", "it", "is", "is", "is"]
pd.Series(original_list).value_counts()
If you wanted it in ascending order instead, it is as simple as:
pd.Series(original_list).value_counts().sort_values(ascending=True)
回答10:
Try this:
words = []
freqs = []
for line in sorted(original list): #takes all the lines in a text and sorts them
line = line.rstrip() #strips them of their spaces
if line not in words: #checks to see if line is in words
words.append(line) #if not it adds it to the end words
freqs.append(1) #and adds 1 to the end of freqs
else:
index = words.index(line) #if it is it will find where in words
freqs[index] += 1 #and use the to change add 1 to the matching index in freqs
回答11:
Here is code support your question is_char() check for validate string count those strings alone, Hashmap is dictionary in python
def is_word(word):
cnt =0
for c in word:
if 'a' <= c <='z' or 'A' <= c <= 'Z' or '0' <= c <= '9' or c == '$':
cnt +=1
if cnt==len(word):
return True
return False
def words_freq(s):
d={}
for i in s.split():
if is_word(i):
if i in d:
d[i] +=1
else:
d[i] = 1
return d
print(words_freq('the the sky$ is blue not green'))
回答12:
the best thing to do is :
def wordListToFreqDict(wordlist):
wordfreq = [wordlist.count(p) for p in wordlist]
return dict(zip(wordlist, wordfreq))
then try to :
wordListToFreqDict(originallist)
来源:https://stackoverflow.com/questions/20510768/count-frequency-of-words-in-a-list-and-sort-by-frequency