Python CSV - Need to Group and Calculate values based on one key

前端 未结 2 1536
清歌不尽
清歌不尽 2021-01-06 10:44

I have a simple 3 column csv file that i need to use python to group each row based on one key, then average the values for another key and return them. File is standard csv

相关标签:
2条回答
  • 2021-01-06 11:16

    I've documented some steps to help clarify things:

    import csv
    from collections import defaultdict
    
    # a dictionary whose value defaults to a list.
    data = defaultdict(list)
    # open the csv file and iterate over its rows. the enumerate()
    # function gives us an incrementing row number
    for i, row in enumerate(csv.reader(open('data.csv', 'rb'))):
        # skip the header line and any empty rows
        # we take advantage of the first row being indexed at 0
        # i=0 which evaluates as false, as does an empty row
        if not i or not row:
            continue
        # unpack the columns into local variables
        _, zipcode, level = row
        # for each zipcode, add the level the list
        data[zipcode].append(float(level))
    
    # loop over each zipcode and its list of levels and calculate the average
    for zipcode, levels in data.iteritems():
        print zipcode, sum(levels) / float(len(levels))
    

    Output:

    19102 21.4
    19003 29.415
    19083 29.65
    
    0 讨论(0)
  • 2021-01-06 11:35

    Usually if I have to do complicate elaboration I use csv to load the rows in a table of a relational DB (sqlite is the fastest way) then I use the standard sql methods to extract data and calculate average values:

    import csv
    from StringIO import StringIO
    import sqlite3
    
    data = """1,19003,27.50
    2,19003,31.33
    3,19083,41.4
    4,19083,17.9
    5,19102,21.40
    """
    
    f = StringIO(data)
    reader = csv.reader(f)
    
    conn = sqlite3.connect(':memory:')
    c = conn.cursor()
    c.execute('''create table data (ID text, ZIPCODE text, RATE real)''')
    conn.commit()
    
    for e in reader:
        e[2] = float(e[2])
        c.execute("""insert into data
              values (?,?,?)""", e)
    
    conn.commit()
    
    c.execute('''select ZIPCODE, avg(RATE) from data group by ZIPCODE''')
    for row in c:
        print row
    
    0 讨论(0)
提交回复
热议问题