statsmodels | 易学教程

Unexpected standard errors with weighted least squares in Python Pandas

阅读更多关于 Unexpected standard errors with weighted least squares in Python Pandas

问题 In the code for the main OLS class in Python Pandas, I am looking for help to clarify what conventions are used for the standard error and t-stats reported when weighted OLS is performed. Here's my example data set, with some imports to use Pandas and to use scikits.statsmodels WLS directly: import pandas import numpy as np from statsmodels.regression.linear_model import WLS # Make some random data. np.random.seed(42) df = pd.DataFrame(np.random.randn(10, 3), columns=['a', 'b', 'weights']) #

Unexpected standard errors with weighted least squares in Python Pandas

阅读更多关于 Unexpected standard errors with weighted least squares in Python Pandas

Python - StatsModels, OLS Confidence interval

阅读更多关于 Python - StatsModels, OLS Confidence interval

问题 In Statsmodels I can fit my model using import statsmodels.api as sm X = np.array([22000, 13400, 47600, 7400, 12000, 32000, 28000, 31000, 69000, 48600]) y = np.array([0.62, 0.24, 0.89, 0.11, 0.18, 0.75, 0.54, 0.61, 0.92, 0.88]) X2 = sm.add_constant(X) est = sm.OLS(y, X2) est2 = est.fit() then print a nice summary using print(est2.summary()) and the extract things like the p-values using est2.pvalues which can be found on this page http://www.statsmodels.org/dev/generated/statsmodels

Ignoring missing values in multiple OLS regression with statsmodels

阅读更多关于 Ignoring missing values in multiple OLS regression with statsmodels

问题 I'm trying to run a multiple OLS regression using statsmodels and a pandas dataframe. There are missing values in different columns for different rows, and I keep getting the error message: ValueError: array must not contain infs or NaNs I saw this SO question, which is similar but doesn't exactly answer my question: statsmodel.api.Logit: valueerror array must not contain infs or nans What I would like to do is run the regression and ignore all rows where there are missing variables for the

Pandas/Statsmodel OLS predicting future values

阅读更多关于 Pandas/Statsmodel OLS predicting future values

问题 I've been trying to get a prediction for future values in a model I've created. I have tried both OLS in pandas and statsmodels. Here is what I have in statsmodels: import statsmodels.api as sm endog = pd.DataFrame(dframe['monthly_data_smoothed8']) smresults = sm.OLS(dframe['monthly_data_smoothed8'], dframe['date_delta']).fit() sm_pred = smresults.predict(endog) sm_pred The length of the array returned is equal to the number of records in my original dataframe but the values are not the same.

ARIMA seasonal prediction with Python: x12a and x13as not found on path

阅读更多关于 ARIMA seasonal prediction with Python: x12a and x13as not found on path

问题 I am using Statsmodels to implement seasonal ARIMA prediction for time series. Here is my code : import statsmodels.api as sm from statsmodels.tsa.x13 import x13_arima_select_order, _find_x12 import pandas import scipy import numpy import imp data_source = imp.load_source('data_source', '/mypath/') def main(): data=data_source.getdata() res = x13_arima_select_order(data) print (res.order, res.sorder) main() When running the code, I am getting this exception: X13NotFoundError("x12a and x13as

Statsmodels mosaic plot ValueError: cannot convert float NaN to integer

阅读更多关于 Statsmodels mosaic plot ValueError: cannot convert float NaN to integer

问题 I have a simple pandas DataFrame, for which I would like to create a mosaic plot. Here is my code: import pandas as pd from statsmodels.graphics.mosaicplot import mosaic mydata = pd.DataFrame({'id2': {64: 'Angelica', 65: 'DXW_UID', 66: 'casuid01', 67: 'casuid01', 68: 'EC93_uid', 69: 'EC93_uid', 70: 'EC93_uid', 60: 'DXW_UID', 61: 'AtmosFox', 62: 'DXW_UID', 63: 'DXW_UID'}, 'id1': {64: 'TGP', 65: 'Retention01', 66: 'default', 67: 'default', 68: 'Musa_EC_9_3', 69: 'Musa_EC_9_3', 70: 'Musa_EC_9_3'

How to plot multiple seasonal_decompose plots in one figure?

阅读更多关于 How to plot multiple seasonal_decompose plots in one figure?

问题 I am decomposing multiple time series using the seasonality decomposition offered by statsmodels .Here is the code and the corresponding output: def seasonal_decompose(item_index): tmp = df2.loc[df2.item_id_copy == item_ids[item_index], "sales_quantity"] res = sm.tsa.seasonal_decompose(tmp) res.plot() plt.show() seasonal_decompose(100) Can someone please tell me how I could plot multiple such plots in a row X column format to see how multiple time series are behaving? 回答1: sm.tsa.seasonal

statespace.SARIMAX model: why the model use all the data to train mode, and predict the a range of train model

阅读更多关于 statespace.SARIMAX model: why the model use all the data to train mode, and predict the a range of train model

问题 I followed the tutorial to study the SARIMAX model: https://www.digitalocean.com/community/tutorials/a-guide-to-time-series-forecasting-with-arima-in-python-3. The date range of data is 1958-2001. mod = sm.tsa.statespace.SARIMAX(y, order=(1, 1, 1), seasonal_order=(1, 1, 1, 12), enforce_stationarity=False, enforce_invertibility=False) results = mod.fit() when are fitting an ARIMA Time Series Model, I found the author all date range data to fit parameter of model. But when validating Forecasts,

How to calculate the likelihood of curve-fitting in scipy?

阅读更多关于 How to calculate the likelihood of curve-fitting in scipy?

问题 I have a nonlinear model fit that looks like this: The dark solid line is the model fit, and the grey part is the raw data. Short version of the question: how do I get the likelihood of this model fit, so I can perform log-likelihood ratio test? Assume that the residual is normally distributed. I am relatively new to statistics, and my current thoughts are: Get the residual from the curve fit, and calculate the variance of residual; Use this equation And plug in the variance of residual into