I have 30 csv data files from 30 replicate runs of an experiment I ran. I am using pandas\' read_csv()
function to read the data into a list of DataFrames. I wo
Check it out:
In [14]: glued = pd.concat([x, y], axis=1, keys=['x', 'y'])
In [15]: glued
Out[15]:
x y
A B C A B C
0 -0.264438 -1.026059 -0.619500 1.923135 0.135355 -0.285491
1 0.927272 0.302904 -0.032399 -0.208940 0.642432 -0.764902
2 -0.264273 -0.386314 -0.217601 1.477419 -1.659804 -0.431375
3 -0.871858 -0.348382 1.100491 -1.191664 0.152576 0.935773
In [16]: glued.swaplevel(0, 1, axis=1).sortlevel(axis=1)
Out[16]:
A B C
x y x y x y
0 -0.264438 1.923135 -1.026059 0.135355 -0.619500 -0.285491
1 0.927272 -0.208940 0.302904 0.642432 -0.032399 -0.764902
2 -0.264273 1.477419 -0.386314 -1.659804 -0.217601 -0.431375
3 -0.871858 -1.191664 -0.348382 0.152576 1.100491 0.935773
In [17]: glued = glued.swaplevel(0, 1, axis=1).sortlevel(axis=1)
In [18]: glued
Out[18]:
A B C
x y x y x y
0 -0.264438 1.923135 -1.026059 0.135355 -0.619500 -0.285491
1 0.927272 -0.208940 0.302904 0.642432 -0.032399 -0.764902
2 -0.264273 1.477419 -0.386314 -1.659804 -0.217601 -0.431375
3 -0.871858 -1.191664 -0.348382 0.152576 1.100491 0.935773
For the record, swapping the level and reordering was not necessary, just for visual purposes.
Then you can do stuff like:
In [19]: glued.groupby(level=0, axis=1).mean()
Out[19]:
A B C
0 0.829349 -0.445352 -0.452496
1 0.359166 0.472668 -0.398650
2 0.606573 -1.023059 -0.324488
3 -1.031761 -0.097903 1.018132
Have a look at the pandas.concat()
function. When you read in your files, you can use concat
to join the resulting DataFrames into one, then just use normal pandas averaging techniques to average them.
To use it, just pass it a list of the DataFrames you want joined together:
>>> x
A B C
0 -0.264438 -1.026059 -0.619500
1 0.927272 0.302904 -0.032399
2 -0.264273 -0.386314 -0.217601
3 -0.871858 -0.348382 1.100491
>>> y
A B C
0 1.923135 0.135355 -0.285491
1 -0.208940 0.642432 -0.764902
2 1.477419 -1.659804 -0.431375
3 -1.191664 0.152576 0.935773
>>> pandas.concat([x, y])
A B C
0 -0.264438 -1.026059 -0.619500
1 0.927272 0.302904 -0.032399
2 -0.264273 -0.386314 -0.217601
3 -0.871858 -0.348382 1.100491
0 1.923135 0.135355 -0.285491
1 -0.208940 0.642432 -0.764902
2 1.477419 -1.659804 -0.431375
3 -1.191664 0.152576 0.935773
I figured out one way to do it.
pandas DataFrames can be added together with the DataFrame.add() function: http://pandas.sourceforge.net/generated/pandas.DataFrame.add.html
So I can add the DataFrames together then divide by the number of DataFrames, e.g.:
avgDataFrame = DataFrameList[0]
for i in range(1, len(DataFrameList)):
avgDataFrame = avgDataFrame.add(DataFrameList[i])
avgDataFrame = avgDataFrame / len(DataFrameList)