Counting word occurrences in csv and determine row appearances

南笙酒味 提交于 2019-12-14 04:11:24

问题


I have a csv file such as the following in one column. The symbols and numbers are only to show that the file does not just contain text. I have two objectives:

  1. count the number of occurrences of a word;
  2. determine how many rows a word appears in.

Stuff
I like apples. Sally likes apples.
Jim has 4 berries.  !@#
John has 2 apples.

Ideally, the code should return something like: {apples: 3} {# of rows: 2}

I've written some code to try and count occurrences, but it isn't running properly (assumedly because of the punctuation). Also, I do not know how to determine the number of rows a word appears in; this could be as simple as counting the number of unique occurrences in each row, but I'm unsure of how to proceed. Here is the code I have so far, done in Python 3.6.1:

import csv
my_reader = csv.reader(open('file.csv', encoding = 'utf-8'))
ctr = 0
for record in my_reader:
    if record[0] == 'apples':
        ctr += 1
print(ctr)

The code merely returns 0 as the answer. Help?


回答1:


You are comparing if the row == 'apple, what you need is if 'apple' in row. And to count the occurrences you can use str.count(), for example:

import csv
my_reader = csv.reader(open('file.csv', encoding = 'utf-8'))
ctr = 0
rows = 0
for record in my_reader:
    if 'apples' in record[0]:
        rows += 1
        ctr += record[0].count('apples')

print('apples: {}, rows: {}'.format(ctr, rows))

This way you will check if the row contains apples then you increment rows by one and increment ctr by number of apples in that row.




回答2:


import collections
import csv

occurrences = collections.defaultdict(lambda: collections.Counter())
with open('path/to/file') as infile:
    for r,row in enumerate(csv.reader(infile)):
        r = (r,)
        for word in (w for col in row for w in col.split()):
            occurrences[word].update(r)

for word,occs in occurrences.items():
    print("{} appears {} times on {} rows".format(word, sum(occs.values()), len(occs)))



回答3:


I don't know why you are using the csv reader, since you are not using any csv file.

here is a code that will do what you need using less code.

my_reader = open('file.csv', encoding = 'utf-8')
rows = 0
apples = 0

for record in my_reader:
    if record.count('apple') > 0:
        rows += 1
        apples += record.count('apple')

print('{apples: %d } {# of rows: %d }' % (apples, rows))

Here is the code running: https://repl.it/JkVn/1



来源:https://stackoverflow.com/questions/45339662/counting-word-occurrences-in-csv-and-determine-row-appearances

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!