I have a simple 3 column csv file that i need to use python to group each row based on one key, then average the values for another key and return them. File is standard csv
I've documented some steps to help clarify things:
import csv
from collections import defaultdict
# a dictionary whose value defaults to a list.
data = defaultdict(list)
# open the csv file and iterate over its rows. the enumerate()
# function gives us an incrementing row number
for i, row in enumerate(csv.reader(open('data.csv', 'rb'))):
# skip the header line and any empty rows
# we take advantage of the first row being indexed at 0
# i=0 which evaluates as false, as does an empty row
if not i or not row:
continue
# unpack the columns into local variables
_, zipcode, level = row
# for each zipcode, add the level the list
data[zipcode].append(float(level))
# loop over each zipcode and its list of levels and calculate the average
for zipcode, levels in data.iteritems():
print zipcode, sum(levels) / float(len(levels))
Output:
19102 21.4
19003 29.415
19083 29.65
Usually if I have to do complicate elaboration I use csv to load the rows in a table of a relational DB (sqlite is the fastest way) then I use the standard sql methods to extract data and calculate average values:
import csv
from StringIO import StringIO
import sqlite3
data = """1,19003,27.50
2,19003,31.33
3,19083,41.4
4,19083,17.9
5,19102,21.40
"""
f = StringIO(data)
reader = csv.reader(f)
conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('''create table data (ID text, ZIPCODE text, RATE real)''')
conn.commit()
for e in reader:
e[2] = float(e[2])
c.execute("""insert into data
values (?,?,?)""", e)
conn.commit()
c.execute('''select ZIPCODE, avg(RATE) from data group by ZIPCODE''')
for row in c:
print row