I have a list with two integer fields which I would like to sum (string,integer, integer)
myList= [[[\"26-07-2017\",2,0], [\"26-07-2017\",3,0], [\"27-07-2017\",
I would use a dictionary to keep track of like first entries, as so:
my_dict = {}
for entry in myList:
if entry[0] not in my_dict:
#This makes my_dict hold dates as keys and a list of 2 integers as values
my_dict[entry[0]] = [entry[1:]]
else:
#In the case that the date is already in my_dict, add the new integers
my_dict[entry[0]][0] += entry[1]
my_dict[entry[0]][1] += entry[2]
#Now my_dict holds dates as keys with all the sums following
#If I really need it to be in the list format you asked for:
sumList = []
for value in my_dict:
sumList.append(value, my_dict[value][0], my_dict[value][1])
You can use dict to store your unique dates and sum of the values
Code:
myList= [[["26-07-2017",2,0], ["26-07-2017",3,0], ["27-07-2017",1,0], ["27-07-2017",0,1]]]
dic = {}
for x in myList[0]:
try:
dic[x[0]][0] = dic[x[0]][0]+x[1]
dic[x[0]][1] = dic[x[0]][1] + x[2]
except:
dic[x[0]] = [x[1], x[2]]
[[k,v[0], v[1]]for k,v in dic.items()]
Output:
[['26-07-2017', 5, 0], ['27-07-2017', 1, 1]]
You can use itertools.groupby to group the items on the date, then use reduce to sum numbers in each group:
from itertools import groupby
lst = [[k] + reduce(lambda x, y: [y[1]+x[1], y[2]+x[2]], g)
for k, g in groupby(myList[0], lambda x: x[0])]
print [lst]
# [[['26-07-2017', 5, 0], ['27-07-2017', 1, 1]]]
Python 3 requires importing reduce
: from functools import reduce
You could avoid using the relatively less intuitve reduce
(also in submission to GvR) by taking the sums in a for loop:
from itertools import groupby
lst = []
for k, g in groupby(myList[0], lambda x: x[0]):
g = [sum(d) for d in zip(*(t[1:] for t in g))]
lst.append([k] + g)
print [lst]
# [[['26-07-2017', 5, 0], ['27-07-2017', 1, 1]]]
You can probably do this with Pandas
import pandas as pd
df = pd.DataFrame(myList[0])
answer = df.groupby([0]).sum()
gives me
1 2
0
26-07-2017 5 0
27-07-2017 1 1
EDIT: I used your list as is above, but with a few modifications, the code makes a bit more sense:
# name the columns
df = pd.DataFrame(myList[0], columns=['date', 'int1', 'int2'])
# group on the date column
df.groupby(['date']).sum()
returns
int1 int2
date
26-07-2017 5 0
27-07-2017 1 1
and the dataframe looks like:
date int1 int2
0 26-07-2017 2 0
1 26-07-2017 3 0
2 27-07-2017 1 0
3 27-07-2017 0 1