statsmodels

Unexpected standard errors with weighted least squares in Python Pandas

假如想象 提交于 2019-12-21 19:47:18
问题 In the code for the main OLS class in Python Pandas, I am looking for help to clarify what conventions are used for the standard error and t-stats reported when weighted OLS is performed. Here's my example data set, with some imports to use Pandas and to use scikits.statsmodels WLS directly: import pandas import numpy as np from statsmodels.regression.linear_model import WLS # Make some random data. np.random.seed(42) df = pd.DataFrame(np.random.randn(10, 3), columns=['a', 'b', 'weights']) #

Unexpected standard errors with weighted least squares in Python Pandas

淺唱寂寞╮ 提交于 2019-12-21 19:47:05
问题 In the code for the main OLS class in Python Pandas, I am looking for help to clarify what conventions are used for the standard error and t-stats reported when weighted OLS is performed. Here's my example data set, with some imports to use Pandas and to use scikits.statsmodels WLS directly: import pandas import numpy as np from statsmodels.regression.linear_model import WLS # Make some random data. np.random.seed(42) df = pd.DataFrame(np.random.randn(10, 3), columns=['a', 'b', 'weights']) #

Python - StatsModels, OLS Confidence interval

我与影子孤独终老i 提交于 2019-12-21 12:12:41
问题 In Statsmodels I can fit my model using import statsmodels.api as sm X = np.array([22000, 13400, 47600, 7400, 12000, 32000, 28000, 31000, 69000, 48600]) y = np.array([0.62, 0.24, 0.89, 0.11, 0.18, 0.75, 0.54, 0.61, 0.92, 0.88]) X2 = sm.add_constant(X) est = sm.OLS(y, X2) est2 = est.fit() then print a nice summary using print(est2.summary()) and the extract things like the p-values using est2.pvalues which can be found on this page http://www.statsmodels.org/dev/generated/statsmodels

Ignoring missing values in multiple OLS regression with statsmodels

你说的曾经没有我的故事 提交于 2019-12-21 07:30:14
问题 I'm trying to run a multiple OLS regression using statsmodels and a pandas dataframe. There are missing values in different columns for different rows, and I keep getting the error message: ValueError: array must not contain infs or NaNs I saw this SO question, which is similar but doesn't exactly answer my question: statsmodel.api.Logit: valueerror array must not contain infs or nans What I would like to do is run the regression and ignore all rows where there are missing variables for the

Pandas/Statsmodel OLS predicting future values

不打扰是莪最后的温柔 提交于 2019-12-21 05:47:17
问题 I've been trying to get a prediction for future values in a model I've created. I have tried both OLS in pandas and statsmodels. Here is what I have in statsmodels: import statsmodels.api as sm endog = pd.DataFrame(dframe['monthly_data_smoothed8']) smresults = sm.OLS(dframe['monthly_data_smoothed8'], dframe['date_delta']).fit() sm_pred = smresults.predict(endog) sm_pred The length of the array returned is equal to the number of records in my original dataframe but the values are not the same.

ARIMA seasonal prediction with Python: x12a and x13as not found on path

核能气质少年 提交于 2019-12-21 05:46:58
问题 I am using Statsmodels to implement seasonal ARIMA prediction for time series. Here is my code : import statsmodels.api as sm from statsmodels.tsa.x13 import x13_arima_select_order, _find_x12 import pandas import scipy import numpy import imp data_source = imp.load_source('data_source', '/mypath/') def main(): data=data_source.getdata() res = x13_arima_select_order(data) print (res.order, res.sorder) main() When running the code, I am getting this exception: X13NotFoundError("x12a and x13as

Statsmodels mosaic plot ValueError: cannot convert float NaN to integer

被刻印的时光 ゝ 提交于 2019-12-21 05:12:04
问题 I have a simple pandas DataFrame, for which I would like to create a mosaic plot. Here is my code: import pandas as pd from statsmodels.graphics.mosaicplot import mosaic mydata = pd.DataFrame({'id2': {64: 'Angelica', 65: 'DXW_UID', 66: 'casuid01', 67: 'casuid01', 68: 'EC93_uid', 69: 'EC93_uid', 70: 'EC93_uid', 60: 'DXW_UID', 61: 'AtmosFox', 62: 'DXW_UID', 63: 'DXW_UID'}, 'id1': {64: 'TGP', 65: 'Retention01', 66: 'default', 67: 'default', 68: 'Musa_EC_9_3', 69: 'Musa_EC_9_3', 70: 'Musa_EC_9_3'

How to plot multiple seasonal_decompose plots in one figure?

╄→гoц情女王★ 提交于 2019-12-21 04:51:31
问题 I am decomposing multiple time series using the seasonality decomposition offered by statsmodels .Here is the code and the corresponding output: def seasonal_decompose(item_index): tmp = df2.loc[df2.item_id_copy == item_ids[item_index], "sales_quantity"] res = sm.tsa.seasonal_decompose(tmp) res.plot() plt.show() seasonal_decompose(100) Can someone please tell me how I could plot multiple such plots in a row X column format to see how multiple time series are behaving? 回答1: sm.tsa.seasonal

statespace.SARIMAX model: why the model use all the data to train mode, and predict the a range of train model

限于喜欢 提交于 2019-12-21 04:15:35
问题 I followed the tutorial to study the SARIMAX model: https://www.digitalocean.com/community/tutorials/a-guide-to-time-series-forecasting-with-arima-in-python-3. The date range of data is 1958-2001. mod = sm.tsa.statespace.SARIMAX(y, order=(1, 1, 1), seasonal_order=(1, 1, 1, 12), enforce_stationarity=False, enforce_invertibility=False) results = mod.fit() when are fitting an ARIMA Time Series Model, I found the author all date range data to fit parameter of model. But when validating Forecasts,

How to calculate the likelihood of curve-fitting in scipy?

那年仲夏 提交于 2019-12-21 04:14:04
问题 I have a nonlinear model fit that looks like this: The dark solid line is the model fit, and the grey part is the raw data. Short version of the question: how do I get the likelihood of this model fit, so I can perform log-likelihood ratio test? Assume that the residual is normally distributed. I am relatively new to statistics, and my current thoughts are: Get the residual from the curve fit, and calculate the variance of residual; Use this equation And plug in the variance of residual into