I have a python list as follows:
my_list =
[[25, 1, 0.65],
[25, 3, 0.63],
[25, 2, 0.62],
[50, 3, 0.65],
[50, 2, 0.63],
[50, 1, 0.62]]
Use the following:
my_list = [[25, 1, 0.65], [25, 3, 0.63], [25, 2, 0.62], [50, 3, 0.65], [50, 2, 0.63], [50, 1, 0.62]]
list_25 = sorted([item for item in my_list if item[0] == 25], key=lambda item: item[1])
list_50 = sorted([item for item in my_list if item[0] == 50], key=lambda item: item[1])
res = [[i[2], j[2]] for i,j in zip(list_25, list_50)]
Output:
>>> res
[[0.65, 0.62], [0.62, 0.63], [0.63, 0.65]]
The numpy_indexed package (disclaimer: I am its author) has a one-liner for these kind of problems:
import numpy_indexed as npi
my_list = np.asarray(my_list)
keys, table = npi.Table(my_list[:, 1], my_list[:, 0]).mean(my_list[:, 2])
Note that if duplicate values are present in the list, the mean is reported in the table.
EDIT: added some improvements to the master of numpy_indexed, that allow more control over the way you convert to a table; for instance, there is Table.unique which asserts that each item in the table occurs once in the list, and Table.sum; and eventually all other reductions supported by the numpy_indexed package that make sense. Hopefully I can do a new release for that tonight.
You can sort your list with native python, but I find it easiest to get your required list using numpy. Since you were going to use pandas anyway, I consider this to be an acceptable solution:
from operator import itemgetter
import numpy as np
# or just use pandas.np if you have that already imported
my_list = [[25, 1, 0.65],
[25, 3, 0.63],
[25, 2, 0.62],
[50, 3, 0.65],
[50, 2, 0.63],
[50, 1, 0.62]]
sorted_list = sorted(my_list,key=itemgetter(1,0)) # sort by second and first column
sliced_array = np.array(sorted_list)[:,-1].reshape(-1,2)
final_list = sliced_array.tolist() # to get a list
The main point is to use itemgetter
to sort your list on two columns one after the other. The resulting sorted list contains the required elements in its third column, which I extract with numpy. It could be done with native python, but if you're already using numpy/pandas, this should be natural.
A way to do this with pandas is to extract each group, pull out 'c'
, convert to a list and append to the list you want :
z = []
>>> for g in df.groupby('b'):
z.append(g[1]['c'].tolist())
>>> z
[[0.65, 0.62], [0.62, 0.63], [0.63, 0.65]]
You could do this as a list comprehension:
>>> res = [g[1]['c'].tolist() for g in df.groupby('b')]
>>> res
[[0.65, 0.62], [0.62, 0.63], [0.63, 0.65]]
Another way would be to apply
list
directly to df.groupby('b')['c']
this gives you the object you need. Then call the .tolist()
method to return a list of lists:
>>> df.groupby('b')['c'].apply(list).tolist()
[[0.65000000000000002, 0.62], [0.62, 0.63], [0.63, 0.65000000000000002]]