Converting word frequency to a graphical histogram in python

只愿长相守 提交于 2019-12-22 00:40:31

问题


This is what I have right now, thanks to Pavel Anossov. I am trying to convert the word frequency that has been outputed into asterisks.

import sys
import operator 
from collections import Counter
def candidateWord():


   with open("sample.txt", 'r') as f:
      text = f.read()
   words = [w.strip('!,.?1234567890-=@#$%^&*()_+')for w in text.lower().split()]
            #word_count[words] = word_count.get(words,0) + 1
   counter = Counter(words)

   print("\n".join("{} {}".format(*p) for p in counter.most_common()))

candidateWord()

This is what I have right now as an output.

how 3

i 2

am 2

are 2

you 2

good 1

hbjkdfd 1

The formula I want to try and use is the most frequent word occurs M times and the current word occurs N times, the number of asterisks printed is:

(50 * N) / M

回答1:


I'll put the asterisks on the left to avoid aligning words:

...
counter = Counter(words)
max_freq = counter.most_common()[0][1]
for word, freq in sorted(counter.most_common(), key=lambda p: (-p[1], p[0])):
    number_of_asterisks = (50 * freq ) // max_freq     # (50 * N) / M
    asterisks = '*' * number_of_asterisks        # the (50*N)/M asterisks
    print('{:>50} {}'.format(asterisks, word))

The :>50 format string means "left-pad with spaces to 50 characters".

  • counter.most_common returns a list of (word, frequency) pairs, sorted by frequency
  • counter.most_common()[0][1] if the second element of the first pair, so max frequency
  • We are looping over counter.most_common() sorted by descending frequency first, then word
  • number_of_asterisks is calculated by your formula. We use integer division // to get an integer result.
  • We repeat an asterisk number_of_asterisks times and store the result in asterisks
  • We print asterisks and word. Asterisks are right-aligned in a 50-characters-wide column.



回答2:


The code:

import sys
import operator 
from collections import Counter
def candidateWord():
   with open("sample.txt", 'r') as f:
      text = f.read()
   words = [w.strip('!,.?1234567890-=@#$%^&*()_+')for w in text.lower().split()]
            #word_count[words] = word_count.get(words,0) + 1
   counter = Counter(words)

   # I added the code below...
   columns = 80
   n_occurrences = 10
   to_plot = counter.most_common(n_occurrences)
   labels, values = zip(*to_plot)
   label_width = max(map(len, labels))
   data_width = columns - label_width - 1
   plot_format = '{:%d}|{:%d}' % (label_width, data_width)
   max_value = float(max(values))
   for i in range(len(labels)):
     v = int(values[i]/max_value*data_width)
     print(plot_format.format(labels[i], '*'*v))

candidateWord()

outputs:

the |***************************************************************************
and |**********************************************                             
of  |******************************************                                 
to  |***************************                                                
a   |************************                                                   
in  |********************                                                       
that|******************                                                         
i   |****************                                                           
was |*************                                                              
it  |**********                                                                 


来源:https://stackoverflow.com/questions/15735406/converting-word-frequency-to-a-graphical-histogram-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!