Python algorithm of counting occurrence of specific word in csv

给你一囗甜甜゛ 提交于 2021-02-04 15:34:08

问题


I've just started to learn python. I'm curious about what are the efficient ways to count the occurrence of a specific word in a CSV file, other than simply use for loop to go through line by line and read.

To be more specific, let's say I have a CSV file contain two columns, "Name" and "Grade", with millions of records.

How would one count the occurrence of "A" under "Grade"?

Python code samples would be greatly appreciated!


回答1:


Basic example, with using csv and collections.Counter (Python 2.7+) from standard Python libraly:

import csv
import collections

grades = collections.Counter()
with open('file.csv') as input_file:
    for row in csv.reader(input_file, delimiter=';'):
        grades[row[1]] += 1

print 'Number of A grades: %s' % grades['A']
print grades.most_common()

Output (for small dataset):

Number of A grades: 2055
[('A', 2055), ('B', 2034), ('D', 1995), ('E', 1977), ('C', 1939)]



回答2:


You should of course read all the grades, which in this case also means reading the entire file. You can use the csv module to easily read comma separated value files:

import csv
my_reader = csv.reader(open('my_file.csv'))
ctr = 0
for record in my_reader:
    if record[1] == 'A':
        ctr += 1
print(ctr)

This is pretty fast, and I couldn't do better with the Counter method:

from collections import Counter
grades = [rec[1] for rec in my_reader] # generator expression was actually slower
result = Counter(grades)
print(result)

Last but not least, lists have a count method:

from collections import Counter
grades = [rec[1] for rec in my_reader]
result = grades.count('A')
print(result)


来源:https://stackoverflow.com/questions/9247241/python-algorithm-of-counting-occurrence-of-specific-word-in-csv

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!