statsmodels

Summary not working for OLS estimation

烈酒焚心 提交于 2020-01-14 08:17:32
问题 I am having an issue with my statsmodels OLS estimation. The model runs without any issues, but when I try to call for a summary so that I can see the actual results I get the TypeError of the axis needing to be specified when shapes of a and weights differ. My code looks like this: from __future__ import print_function, division import xlrd as xl import numpy as np import scipy as sp import pandas as pd import statsmodels.formula.api as smf import statsmodels.api as sm file_loc = "/Users

How to get the regression intercept using Statsmodels.api

只愿长相守 提交于 2020-01-12 04:50:11
问题 I am trying calculate a regression output using python library but I am unabl;e to get the intercept value when I use the library: import statsmodels.api as sm It prints all the regression analysis except the intercept. but when I use: from pandas.stats.api import ols My code for pandas: Regression = ols(y= Sorted_Data3['net_realization_rate'],x = Sorted_Data3[['Cohort_2','Cohort_3']]) print Regression I get the the intercept with a warning that this librabry will be deprecated in the future

How to visualize a nonlinear relationship in a scatter plot

醉酒当歌 提交于 2020-01-11 02:50:07
问题 I want to visually explore the relationship between two variables. The functional form of the relationship is not visible in dense scatter plots like this: How can I add a lowess smooth to the scatter plot in Python? Or do you have any other suggestions to visually explore non-linear relationships? I tried the following but it didn't work properly (drawing on an example from Michiel de Hoon): import numpy as np from statsmodels.nonparametric.smoothers_lowess import lowess x = np.arange(0,10,0

Different model performance evaluations by statsmodels and scikit-learn

做~自己de王妃 提交于 2020-01-07 09:22:09
问题 I am trying to fit a multivariable linear regression on a dataset to find out how well the model explains the data. My predictors have 120 dimensions and I have 177 samples: X.shape=(177,120), y.shape=(177,) Using statsmodels, I get a very good R-squared of 0.76 with a Prob(F-statistic) of 0.06 which trends towards significance and indicates a good model for the data. When I use scikit-learn's linear regression and try to compute 5-fold cross validation r2 score, I get an average r2 score of

Different Linear Regression Coefficients with statsmodels and sklearn

戏子无情 提交于 2020-01-07 05:58:05
问题 I was planning to use sklearn linear_model to plot a graph of linear regression result, and statsmodels.api to get a detail summary of the learning result. However, the two packages produce very different results on the same input. For example, the constant term from sklearn is 7.8e-14, but the constant term from statsmodels is 48.6. (I added a column of 1's in x for constant term when using both methods) My code for both methods are succint: # Use statsmodels linear regression to get a

Subset data points outside confidence interval

前提是你 提交于 2020-01-05 05:53:04
问题 Using the same example as from this previous question (code pasted below), we can get the 95% CI with the summary_table function from statsmodels outliers_influence. But now, how would it be possible to only subset the data points ( x and y ) that are outside the confidence interval? import numpy as np import statsmodels.api as sm from statsmodels.stats.outliers_influence import summary_table #measurements genre n = 100 x = np.linspace(0, 10, n) e = np.random.normal(size=n) y = 1 + 0.5*x + 2

Subset data points outside confidence interval

心不动则不痛 提交于 2020-01-05 05:53:04
问题 Using the same example as from this previous question (code pasted below), we can get the 95% CI with the summary_table function from statsmodels outliers_influence. But now, how would it be possible to only subset the data points ( x and y ) that are outside the confidence interval? import numpy as np import statsmodels.api as sm from statsmodels.stats.outliers_influence import summary_table #measurements genre n = 100 x = np.linspace(0, 10, n) e = np.random.normal(size=n) y = 1 + 0.5*x + 2

How to drop insignificant categorical interaction terms Python StatsModel

别说谁变了你拦得住时间么 提交于 2020-01-04 12:56:32
问题 In stats model it's easy to add interaction term. However not all of the interactions are significant. My question is how to drop those that are insignificant? For example airport at Kootenay. # -*- coding: utf-8 -*- import pandas as pd import statsmodels.formula.api as sm if __name__ == "__main__": # Read data census_subdivision_without_lower_mainland_and_van_island = pd.read_csv('../data/augmented/census_subdivision_without_lower_mainland_and_van_island.csv') # Fit all data fit = sm.ols

Using sklearn RFE with an estimator from another package

狂风中的少年 提交于 2020-01-04 03:55:47
问题 Is it possible to use sklearn Recursive Feature Elimination(RFE) with an estimator from another package? Specifically, I want to use GLM from statsmodels package and wrap it in sklearn RFE? If yes, could you please give some examples? 回答1: Yes, it is possible. You just need to create a class that inherit sklearn.base.BaseEstimator , make sure it has fit & predict methods, and make sure its fit method expose feature importance through either coef_ or feature_importances_ attribute. Here is a

Multiple inputs into Statsmodels ARIMA in Python

蓝咒 提交于 2020-01-03 06:01:10
问题 I am trying to fit a ARIMA model with multiple inputs. As long as the input was a single array it worked fine. Here, I was adviced to put input arrays into a multidimensional array-like structure. So I did: import numpy as np from statsmodels.tsa.arima_model import ARIMA a = [1, 2, 3] b = [4, 5, 6] data = np.dstack([a, b]) for p in range(6): for d in range(2): for q in range(4): order = (p,d,q) try: model = ARIMA(data, order=(p,d,q)) print("this works:{}, {}, {} ".format(p,d,q)) except: pass