statsmodels | 易学教程

ImportError: cannot import name ExponentialSmoothing

阅读更多关于 ImportError: cannot import name ExponentialSmoothing

问题 I tried to install statsmodels in python. After installation, I checked with pip freeze. The package can be seen in the list. When I am trying: from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothing, Holt I am getting error: Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: cannot import name ExponentialSmoothing I have tried the following link also : link 回答1: I met the same situation, and the install process recommended in Nish's url didn

Python ARIMA model, predicted values are shifted

阅读更多关于 Python ARIMA model, predicted values are shifted

问题 I am new to Python ARIMA implementation. I have a data at 15 min frequency for few months. In my attempt to follow the Box-Jenkins method to fit a timeseries model. I ran into an issue towards the end. The ACF-PACF graph for the time series (ts) and the difference series (ts_diff) are given. I used ARIMA (5,1,2) and finally I plotted the fitted values(green) and original values(blue). As you can from figure, there is a clear shift(by one) in values. What am I doing wrong? Is the prediction

One sided t-test for linear regression?

阅读更多关于 One sided t-test for linear regression?

问题 I have problems with this. I am trying to do a linear regression and test the slope. The t-test checks if the slope is far away from 0. The slope can be negative or positive. I am only interested in negative slopes. In this example, the slope is positive which I am not interested in, so the P value should be large. But it is small because right now it tests if the slope is far away from 0, in either direction. (I am forcing an intercept of zero, which is what I want). Can someone help me with

Projecting time series predictions on trend line and including seasonality (Python)

阅读更多关于 Projecting time series predictions on trend line and including seasonality (Python)

问题 For the past few days I'm going crazy with Times series using statsmodels (Python). I am a novice in the TS area, although i do have a better understanding of various regression models. Here is my issue: I have a time-series that I stationarized (either by seasonal_decompose, or by differencing). I also figured out the parameters p,d, and q for the ARIMA model, using ACF and PACF plots.I fit the model on the stationarized TS or the residual (i got from seasonal_decompose). Gladly, i also got

How do I get the columns that a statsmodels / patsy formula depends on?

阅读更多关于 How do I get the columns that a statsmodels / patsy formula depends on?

问题 Suppose I have a pandas dataframe: df = pd.DataFrame({'x1': [0, 1, 2, 3, 4], 'x2': [10, 9, 8, 7, 6], 'x3': [.1, .1, .2, 4, 8], 'y': [17, 18, 19, 20, 21]}) Now I fit a statsmodels model using a formula (which uses patsy under the hood): import statsmodels.formula.api as smf fit = smf.ols(formula='y ~ x1:x2', data=df).fit() What I want is a list of the columns of df that fit depends on, so that I can use fit.predict() on another dataset. If I try list(fit.params.index) , for example, I get: [

How to change maxlag for ARMAX.predict?

阅读更多关于 How to change maxlag for ARMAX.predict?

问题 Still in the process of understanding the ARIMA source code to forecast some data. (I use two time series (indexed_df and external_df with 365 data points each.) I want to compare the forecast accuracy between ARMA and ARMAX. The forecasting process for ARMA seems to work fine. But forecasting with one additional external variable does not work somehow: Getting p and q values for ARMAX: arma_mod1 = sm.tsa.ARMA(indexed_df, (2,0), external_df).fit() y = arma_mod1.params print 'P- and Q-Values

ValueWarning: No frequency information was provided, so inferred frequency MS will be used

阅读更多关于 ValueWarning: No frequency information was provided, so inferred frequency MS will be used

问题 I try to fit Autoregression by sm.tsa.statespace.SARIMAX. But I meet a warning, then I want to set frequency information for this model. Who used to meet it, can you help me ? fit1 = sm.tsa.statespace.SARIMAX(train.Demand, order=(1, 0, 0), enforce_stationarity=False, enforce_invertibility=False).fit() y_hat['AR'] = fit1.predict(start="1975-01-01", end="1975-12-01", dynamic=True) plt.figure(figsize=(16,8)) plt.plot( train['Demand'], label='Train') plt.plot(test['Demand'], label='Test') plt

Why `sklearn` and `statsmodels` implementation of OLS regression give different R^2?

阅读更多关于 Why `sklearn` and `statsmodels` implementation of OLS regression give different R^2?

问题 Accidentally I have noticed, that OLS models implemented by sklearn and statsmodels yield different values of R^2 when not fitting intercept. Otherwise they seems to work fine. The following code yields: import numpy as np import sklearn import statsmodels import sklearn.linear_model as sl import statsmodels.api as sm np.random.seed(42) N=1000 X = np.random.normal(loc=1, size=(N, 1)) Y = 2 * X.flatten() + 4 + np.random.normal(size=N) sklernIntercept=sl.LinearRegression(fit_intercept=True).fit

Creating dummy variable using pandas or statsmodel for interaction of two columns

阅读更多关于 Creating dummy variable using pandas or statsmodel for interaction of two columns

问题 I have a data frame like this: Index ID Industry years_spend asset 6646 892 4 4 144.977037 2347 315 10 8 137.749138 7342 985 1 5 104.310217 137 18 5 5 156.593396 2840 381 11 2 229.538828 6579 883 11 1 171.380125 1776 235 4 7 217.734377 2691 361 1 2 148.865341 815 110 15 4 233.309491 2932 393 17 5 187.281724 I want to create dummy variables for Industry X years_spend which creates len(df.Industry.value_counts()) * len(df.years_spend.value_counts()) varaible, for example d_11_4 = 1 for all rows

Predicting out future values using OLS regression (Python, StatsModels, Pandas)

阅读更多关于 Predicting out future values using OLS regression (Python, StatsModels, Pandas)

问题 I'm currently trying to implement a MLR in Python and am not sure how I go about applying the coefficients I've found to future values. import pandas as pd import statsmodels.formula.api as sm import statsmodels.api as sm2 TV = [230.1, 44.5, 17.2, 151.5, 180.8] Radio = [37.8,39.3,45.9,41.3,10.8] Newspaper = [69.2,45.1,69.3,58.5,58.4] Sales = [22.1, 10.4, 9.3, 18.5,12.9] df = pd.DataFrame({'TV': TV, 'Radio': Radio, 'Newspaper': Newspaper, 'Sales': Sales}) Y = df.Sales X = df[['TV','Radio',