问题
I am attempting to make a linear regression model based on pre project data and ultimately attempt to calculate some modeled data where I could compare pre/post project data... Can anyone tell me what the best proactice is else I maybe off in the weeds somewhere...
For starters:
import statsmodels.api as sm
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
ng = pd.read_csv('C:/Users/ngDataBaseline.csv', thousands=',', index_col='Date', parse_dates=True)
ng.head()
This will output:
HDD Therm
Date
2011-10-01 386 498
2011-11-01 663 1810
2011-12-01 972 4263
2012-01-01 1131 5981
2012-02-01 977 6951
And from statsmodels to fit my model I am using:
import statsmodels.formula.api as smf
formula = 'Therm ~ HDD'
model = smf.ols(formula, data=ng)
results = model.fit()
results.summary()
inter = results.params['Intercept']
slope = results.params['HDD']
inter, slope
prints:
(-532.6244255918659, 6.331883644532255)
So now I think I can import post project data and use some simple math in this format to calculate modeled data:
Y = mX + b
ng_postproject = pd.read_csv('C:/Users/ng_postproject.csv', thousands=',', index_col='Date', parse_dates=True)
ng_postproject.head()
And this will output:
HDD Therm
Date
2014-10-01 291 663
2014-11-01 545 1413
2014-12-01 1069 6754
2015-01-01 1134 7782
2015-02-01 1415 10285
This is what I am using to calculate a modeled Therm usage.
ng_postproject['Therm_modeled'] = ng_postproject['HDD'].apply(lambda x: x * slope + inter)
ng_postproject['Therm_modeled']
Date
2014-10-01 1309.953715
2014-11-01 2918.252161
2014-12-01 6236.159190
2015-01-01 6647.731627
2015-02-01 8426.990931
Now if I am not too far off in the weeds I should be able to add in a column header and compare post/pre project data... It would be really nice too if I could implement a confidence interval as well... Thanks for any response.
来源:https://stackoverflow.com/questions/52635962/python-statsmodels-linear-regression