statsmodels

Multiple inputs into Statsmodels ARIMA in Python

纵然是瞬间 提交于 2020-01-03 06:01:05
问题 I am trying to fit a ARIMA model with multiple inputs. As long as the input was a single array it worked fine. Here, I was adviced to put input arrays into a multidimensional array-like structure. So I did: import numpy as np from statsmodels.tsa.arima_model import ARIMA a = [1, 2, 3] b = [4, 5, 6] data = np.dstack([a, b]) for p in range(6): for d in range(2): for q in range(4): order = (p,d,q) try: model = ARIMA(data, order=(p,d,q)) print("this works:{}, {}, {} ".format(p,d,q)) except: pass

Statsmodels logistic regression convergence problems

大憨熊 提交于 2020-01-02 13:37:12
问题 I'm trying to run a logistic regression in statsmodels on a large design matrix (~200 columns). The features include a number of interactions, categorical features and semi-sparse (70%) integer features. Although my design matrix is not actually ill-conditioned, it seems to be somewhat close (according to numpy.linalg.matrix_rank , it is full-rank with tol=1e-3 but not with tol=1e-2 ). As a result, I'm struggling to get logistic regression to converge with any of the methods in statsmodels.

ImportError: cannot import name 'STL' from 'statsmodels.tsa.seasonal'

梦想与她 提交于 2020-01-02 10:19:55
问题 I have this issue now, I cannot import STL from statsmodels. I've tried to uninstall statsmodels as it was recommended somewhere with a similar issue but it is not possible, at least the way I do it: !pip uninstall statsmodels - NOT working. 回答1: It seems that the STL function from statsmodels is not included in the latest stable version of the library (0.10.2) but is in the dev version (0.11.0dev0). You can build and install this specific version with this command: pip install git+https:/

numpy and statsmodels give different values when calculating correlations, How to interpret this?

我的未来我决定 提交于 2020-01-02 04:55:30
问题 I can't find a reason why calculating the correlation between two series A and B using numpy.correlate gives me different results than the ones I obtain using statsmodels.tsa.stattools.ccf Here's an example of this difference I mention: import numpy as np from matplotlib import pyplot as plt from statsmodels.tsa.stattools import ccf #Calculate correlation using numpy.correlate def corr(x,y): result = numpy.correlate(x, y, mode='full') return result[result.size/2:] #This are the data series I

Python 2.7 - statsmodels - formatting and writing summary output

混江龙づ霸主 提交于 2019-12-31 22:04:33
问题 I'm doing logistic regression using pandas 0.11.0 (data handling) and statsmodels 0.4.3 to do the actual regression, on Mac OSX Lion. I'm going to be running ~2,900 different logistic regression models and need the results output to csv file and formatted in a particular way. Currently, I'm only aware of doing print result.summary() which prints the results (as follows) to the shell: Logit Regression Results ============================================================================== Dep.

Python 2.7 - statsmodels - formatting and writing summary output

为君一笑 提交于 2019-12-31 22:02:07
问题 I'm doing logistic regression using pandas 0.11.0 (data handling) and statsmodels 0.4.3 to do the actual regression, on Mac OSX Lion. I'm going to be running ~2,900 different logistic regression models and need the results output to csv file and formatted in a particular way. Currently, I'm only aware of doing print result.summary() which prints the results (as follows) to the shell: Logit Regression Results ============================================================================== Dep.

What are the pitfalls of using Dill to serialise scikit-learn/statsmodels models?

北慕城南 提交于 2019-12-31 08:57:27
问题 I need to serialise scikit-learn/statsmodels models such that all the dependencies (code + data) are packaged in an artefact and this artefact can be used to initialise the model and make predictions. Using the pickle module is not an option because this will only take care of the data dependency (the code will not be packaged). So, I have been conducting experiments with Dill. To make my question more precise, the following is an example where I build a model and persist it. from sklearn

Detecting mulicollinear , or columns that have linear combinations while modelling in Python : LinAlgError

拜拜、爱过 提交于 2019-12-30 04:35:13
问题 I am modelling data for a logit model with 34 dependent variables,and it keep throwing in the singular matrix error , as below -: Traceback (most recent call last): File "<pyshell#1116>", line 1, in <module> test_scores = smf.Logit(m['event'], train_cols,missing='drop').fit() File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/discrete/discrete_model.py", line 1186, in fit disp=disp, callback=callback, **kwargs) File "/usr/local/lib/python2.7/site

How to calculate the 99% confidence interval for the slope in a linear regression model in python?

不羁岁月 提交于 2019-12-30 04:32:09
问题 We have following linear regression: y ~ b0 + b1 * x1 + b2 * x2. I know that regress function in Matlab does calculate it, but numpy's linalg.lstsq doesn't (https://docs.scipy.org/doc/numpy-dev/user/numpy-for-matlab-users.html). 回答1: StatsModels' RegressionResults has a conf_int() method. Here an example using it (minimally modified version of their Ordinary Least Squares example): import numpy as np, statsmodels.api as sm nsample = 100 x = np.linspace(0, 10, nsample) X = np.column_stack((x,

Getting statsmodels to use heteroskedasticity corrected standard errors in coefficient t-tests

删除回忆录丶 提交于 2019-12-30 03:10:05
问题 I've been digging into the API of statsmodels.regression.linear_model.RegressionResults and have found how to retrieve different flavors of heteroskedasticity corrected standard errors (via properties like HC0_se , etc.) However, I can't quite figure out how to get the t -tests on the coefficients to use these corrected standard errors. Is there a way to do this in the API, or do I have to do it manually? If the latter, can you suggest any guidance on how to do this with statsmodels results?