As the title says:
So far this is where I\'m at my code does work however I am having trouble displaying the information in order. Currently it just displays the inf
Displaying in descending order needs to be outside your search-loop otherwise they will be displayed as they are encountered.
Sorting in descending order is quite easy using the built-in sorted (you'll need to set the reverse
-argument!)
However python is batteries included and there is already a Counter. So it could be as simply as:
from collections import Counter
from operator import itemgetter
def frequencies(filename):
# Sets are especially optimized for fast lookups so this will be
# a perfect fit for the invalid characters.
invalid = set("‘'`,.?!:;-_\n—' '")
# Using open in a with block makes sure the file is closed afterwards.
with open(filename, 'r') as infile:
# The "char for char ...." is a conditional generator expression
# that feeds all characters to the counter that are not invalid.
counter = Counter(char for char in infile.read().lower() if char not in invalid)
# If you want to display the values:
for char, charcount in sorted(counter.items(), key=itemgetter(1), reverse=True):
print(char, charcount)
The Counter already has a most_common
method but you want to display all characters and counts so it's not a good fit in this case. However if you only want to know the x most common counts then it would suitable.
You can sort your dictionary at the time you print, with the sorted
method:
lettercount = {}
invalid = "‘'`,.?!:;-_\n—' '"
infile = open('text.file')
for c in infile.read().lower():
if c not in invalid:
lettercount[c] = lettercount.setdefault(c,0) + 1
for letter in sorted(lettercount):
print("{} appears {} times".format(letter,lettercount[letter]))
Rmq: I used setdefault
change method to set the default value to 0 when we meet a letter for the first time
You don't need to iterate over 'words', and then over letters in them. When you iterate over a string (like content
), you will already have single chars (length 1 strings). Then, you would want to wait untill after your counting loop before showing output. After counting, you could manually sort:
for letter, count in sorted(counter.items(), key=lambda x: x[1], reverse=True):
# do stuff
However, better use collections.Counter:
from collections import Counter
content = filter(lambda x: x not in invalid, content)
c = Counter(content)
for letter, count in c.most_common(): # descending order of counts
print('{:8} appears {} times.'.format(letter, number))
# for letter, number in c.most_common(n): # limit to n most
# print('{:8} appears {} times.'.format(letter, count))
Dictionaries are unordered data structures. Also if you want to count some items within a set of data you better to use collections.Counter() which is more optimized and pythonic for this aim.
Then you can just use Counter.most_common(N)
in order to print most N
common items within your Counter object.
Also regarding the opening of files, you can simply use the with
statement that closes the file at the end of the block automatically. And it's better to not print the final result inside your function instead, you can make your function a generator by yielding the intended lines and then printing them when even you want.
from collections import Counter
def frequencies(filename, top_n):
with open(filename) as infile:
content = infile.read()
invalid = "‘'`,.?!:;-_\n—' '"
counter = Counter(filter(lambda x: not invalid.__contains__(x), content))
for letter, count in counter.most_common(top_n):
yield '{:8} appears {} times.'.format(letter, count)
Then use a for loop in order to iterate over the generator function:
for line in frequencies(filename, 100):
print(line)