I am trying to calculate the mean and confidence interval(95%) of a column \"Force\" in a large dataset. I need the result by using the groupby function by grouping differen
As mentioned in the comments, I could not duplicate your error, but you can try to check that your numbers are stored as numbers and not as strings. use df.info()
and make sure that the relevant columns are float or int:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 2 columns):
Class 6 non-null object # <--- non-number column
Force 6 non-null int64 # <--- number (int) column
dtypes: int64(1), object(1)
memory usage: 176.0+ bytes
import pandas as pd
import numpy as np
import math
df=pd.DataFrame({'Class': ['A1','A1','A1','A2','A3','A3'],
'Force': [50,150,100,120,140,160] },
columns=['Class', 'Force'])
print(df)
print('-'*30)
stats = df.groupby(['Class'])['Force'].agg(['mean', 'count', 'std'])
print(stats)
print('-'*30)
ci95_hi = []
ci95_lo = []
for i in stats.index:
m, c, s = stats.loc[i]
ci95_hi.append(m + 1.96*s/math.sqrt(c))
ci95_lo.append(m - 1.96*s/math.sqrt(c))
stats['ci95_hi'] = ci95_hi
stats['ci95_lo'] = ci95_lo
print(stats)
The output is
Class Force
0 A1 50
1 A1 150
2 A1 100
3 A2 120
4 A3 140
5 A3 160
------------------------------
mean count std
Class
A1 100 3 50.000000
A2 120 1 NaN
A3 150 2 14.142136
------------------------------
mean count std ci95_hi ci95_lo
Class
A1 100 3 50.000000 156.580326 43.419674
A2 120 1 NaN NaN NaN
A3 150 2 14.142136 169.600000 130.400000