Python CSV - Need to Group and Calculate values based on one key

前端未结

关注

 2  1536

I have a simple 3 column csv file that i need to use python to group each row based on one key, then average the values for another key and return them. File is standard csv

相关标签:

2条回答

南方客

2021-01-06 11:16

I've documented some steps to help clarify things:

import csv
from collections import defaultdict

# a dictionary whose value defaults to a list.
data = defaultdict(list)
# open the csv file and iterate over its rows. the enumerate()
# function gives us an incrementing row number
for i, row in enumerate(csv.reader(open('data.csv', 'rb'))):
    # skip the header line and any empty rows
    # we take advantage of the first row being indexed at 0
    # i=0 which evaluates as false, as does an empty row
    if not i or not row:
        continue
    # unpack the columns into local variables
    _, zipcode, level = row
    # for each zipcode, add the level the list
    data[zipcode].append(float(level))

# loop over each zipcode and its list of levels and calculate the average
for zipcode, levels in data.iteritems():
    print zipcode, sum(levels) / float(len(levels))

Output:

19102 21.4
19003 29.415
19083 29.65

0 讨论(0)

伪装坚强ぢ

2021-01-06 11:35

Usually if I have to do complicate elaboration I use csv to load the rows in a table of a relational DB (sqlite is the fastest way) then I use the standard sql methods to extract data and calculate average values:

import csv
from StringIO import StringIO
import sqlite3

data = """1,19003,27.50
2,19003,31.33
3,19083,41.4
4,19083,17.9
5,19102,21.40
"""

f = StringIO(data)
reader = csv.reader(f)

conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('''create table data (ID text, ZIPCODE text, RATE real)''')
conn.commit()

for e in reader:
    e[2] = float(e[2])
    c.execute("""insert into data
          values (?,?,?)""", e)

conn.commit()

c.execute('''select ZIPCODE, avg(RATE) from data group by ZIPCODE''')
for row in c:
    print row

0 讨论(0)