问题
I have a csv file such as the following in one column. The symbols and numbers are only to show that the file does not just contain text. I have two objectives:
- count the number of occurrences of a word;
- determine how many rows a word appears in.
Stuff
I like apples. Sally likes apples.
Jim has 4 berries. !@#
John has 2 apples.
Ideally, the code should return something like: {apples: 3} {# of rows: 2}
I've written some code to try and count occurrences, but it isn't running properly (assumedly because of the punctuation). Also, I do not know how to determine the number of rows a word appears in; this could be as simple as counting the number of unique occurrences in each row, but I'm unsure of how to proceed. Here is the code I have so far, done in Python 3.6.1:
import csv
my_reader = csv.reader(open('file.csv', encoding = 'utf-8'))
ctr = 0
for record in my_reader:
if record[0] == 'apples':
ctr += 1
print(ctr)
The code merely returns 0
as the answer. Help?
回答1:
You are comparing if the row == 'apple
, what you need is if 'apple' in row
. And to count the occurrences you can use str.count()
, for example:
import csv
my_reader = csv.reader(open('file.csv', encoding = 'utf-8'))
ctr = 0
rows = 0
for record in my_reader:
if 'apples' in record[0]:
rows += 1
ctr += record[0].count('apples')
print('apples: {}, rows: {}'.format(ctr, rows))
This way you will check if the row
contains apples
then you increment rows
by one and increment ctr
by number of apples
in that row
.
回答2:
import collections
import csv
occurrences = collections.defaultdict(lambda: collections.Counter())
with open('path/to/file') as infile:
for r,row in enumerate(csv.reader(infile)):
r = (r,)
for word in (w for col in row for w in col.split()):
occurrences[word].update(r)
for word,occs in occurrences.items():
print("{} appears {} times on {} rows".format(word, sum(occs.values()), len(occs)))
回答3:
I don't know why you are using the csv reader, since you are not using any csv file.
here is a code that will do what you need using less code.
my_reader = open('file.csv', encoding = 'utf-8')
rows = 0
apples = 0
for record in my_reader:
if record.count('apple') > 0:
rows += 1
apples += record.count('apple')
print('{apples: %d } {# of rows: %d }' % (apples, rows))
Here is the code running: https://repl.it/JkVn/1
来源:https://stackoverflow.com/questions/45339662/counting-word-occurrences-in-csv-and-determine-row-appearances