问题
This is what I have right now, thanks to Pavel Anossov. I am trying to convert the word frequency that has been outputed into asterisks.
import sys
import operator
from collections import Counter
def candidateWord():
with open("sample.txt", 'r') as f:
text = f.read()
words = [w.strip('!,.?1234567890-=@#$%^&*()_+')for w in text.lower().split()]
#word_count[words] = word_count.get(words,0) + 1
counter = Counter(words)
print("\n".join("{} {}".format(*p) for p in counter.most_common()))
candidateWord()
This is what I have right now as an output.
how 3
i 2
am 2
are 2
you 2
good 1
hbjkdfd 1
The formula I want to try and use is the most frequent word occurs M times and the current word occurs N times, the number of asterisks printed is:
(50 * N) / M
回答1:
I'll put the asterisks on the left to avoid aligning words:
...
counter = Counter(words)
max_freq = counter.most_common()[0][1]
for word, freq in sorted(counter.most_common(), key=lambda p: (-p[1], p[0])):
number_of_asterisks = (50 * freq ) // max_freq # (50 * N) / M
asterisks = '*' * number_of_asterisks # the (50*N)/M asterisks
print('{:>50} {}'.format(asterisks, word))
The :>50
format string means "left-pad with spaces to 50 characters".
counter.most_common
returns a list of (word, frequency) pairs, sorted by frequencycounter.most_common()[0][1]
if the second element of the first pair, so max frequency- We are looping over
counter.most_common()
sorted by descending frequency first, then word number_of_asterisks
is calculated by your formula. We use integer division//
to get an integer result.- We repeat an asterisk
number_of_asterisks
times and store the result inasterisks
- We print
asterisks
andword
. Asterisks are right-aligned in a 50-characters-wide column.
回答2:
The code:
import sys
import operator
from collections import Counter
def candidateWord():
with open("sample.txt", 'r') as f:
text = f.read()
words = [w.strip('!,.?1234567890-=@#$%^&*()_+')for w in text.lower().split()]
#word_count[words] = word_count.get(words,0) + 1
counter = Counter(words)
# I added the code below...
columns = 80
n_occurrences = 10
to_plot = counter.most_common(n_occurrences)
labels, values = zip(*to_plot)
label_width = max(map(len, labels))
data_width = columns - label_width - 1
plot_format = '{:%d}|{:%d}' % (label_width, data_width)
max_value = float(max(values))
for i in range(len(labels)):
v = int(values[i]/max_value*data_width)
print(plot_format.format(labels[i], '*'*v))
candidateWord()
outputs:
the |***************************************************************************
and |**********************************************
of |******************************************
to |***************************
a |************************
in |********************
that|******************
i |****************
was |*************
it |**********
来源:https://stackoverflow.com/questions/15735406/converting-word-frequency-to-a-graphical-histogram-in-python