I'm trying to count all letters in a txt file then display in descending order

后端 未结 4 1304
长情又很酷
长情又很酷 2021-01-18 08:20

As the title says:

So far this is where I\'m at my code does work however I am having trouble displaying the information in order. Currently it just displays the inf

相关标签:
4条回答
  • 2021-01-18 08:26

    Displaying in descending order needs to be outside your search-loop otherwise they will be displayed as they are encountered.

    Sorting in descending order is quite easy using the built-in sorted (you'll need to set the reverse-argument!)

    However python is batteries included and there is already a Counter. So it could be as simply as:

    from collections import Counter
    from operator import itemgetter
    
    def frequencies(filename):
        # Sets are especially optimized for fast lookups so this will be
        # a perfect fit for the invalid characters.
        invalid = set("‘'`,.?!:;-_\n—' '")
    
        # Using open in a with block makes sure the file is closed afterwards.
        with open(filename, 'r') as infile:  
            # The "char for char ...." is a conditional generator expression
            # that feeds all characters to the counter that are not invalid.
            counter = Counter(char for char in infile.read().lower() if char not in invalid)
    
        # If you want to display the values:
        for char, charcount in sorted(counter.items(), key=itemgetter(1), reverse=True):
            print(char, charcount)
    

    The Counter already has a most_common method but you want to display all characters and counts so it's not a good fit in this case. However if you only want to know the x most common counts then it would suitable.

    0 讨论(0)
  • 2021-01-18 08:26

    You can sort your dictionary at the time you print, with the sorted method:

    lettercount = {}
    invalid = "‘'`,.?!:;-_\n—' '"
    infile = open('text.file')
    for c in infile.read().lower():
        if c not in invalid:
            lettercount[c] = lettercount.setdefault(c,0) + 1
    for letter in sorted(lettercount):
        print("{} appears {} times".format(letter,lettercount[letter]))
    

    Rmq: I used setdefault change method to set the default value to 0 when we meet a letter for the first time

    0 讨论(0)
  • 2021-01-18 08:38

    You don't need to iterate over 'words', and then over letters in them. When you iterate over a string (like content), you will already have single chars (length 1 strings). Then, you would want to wait untill after your counting loop before showing output. After counting, you could manually sort:

    for letter, count in sorted(counter.items(), key=lambda x: x[1], reverse=True):
        # do stuff
    

    However, better use collections.Counter:

    from collections import Counter
    
    content = filter(lambda x: x not in invalid, content)
    c = Counter(content)
    for letter, count in c.most_common():  # descending order of counts
        print('{:8} appears {} times.'.format(letter, number))
    # for letter, number in c.most_common(n):  # limit to n most
    #     print('{:8} appears {} times.'.format(letter, count))
    
    0 讨论(0)
  • 2021-01-18 08:41

    Dictionaries are unordered data structures. Also if you want to count some items within a set of data you better to use collections.Counter() which is more optimized and pythonic for this aim.

    Then you can just use Counter.most_common(N) in order to print most N common items within your Counter object.

    Also regarding the opening of files, you can simply use the with statement that closes the file at the end of the block automatically. And it's better to not print the final result inside your function instead, you can make your function a generator by yielding the intended lines and then printing them when even you want.

    from collections import Counter
    
    def frequencies(filename, top_n):
        with open(filename) as infile:
            content = infile.read()
        invalid = "‘'`,.?!:;-_\n—' '"
        counter = Counter(filter(lambda x: not invalid.__contains__(x), content))
        for letter, count in counter.most_common(top_n):
            yield '{:8} appears {} times.'.format(letter, count)
    

    Then use a for loop in order to iterate over the generator function:

    for line in frequencies(filename, 100):
        print(line)
    
    0 讨论(0)
提交回复
热议问题