statsmodels | 易学教程

mixed-models with two random effects - statsmodels

阅读更多关于 mixed-models with two random effects - statsmodels

问题 import pandas as pd import statsmodels.formula.api as smf df = pd.read_csv('http://www.bodowinter.com/tutorial/politeness_data.csv') df = df.drop(38) In R I would do: lmer(frequency ~ attitude + (1|subject) + (1|scenario), data=df) which in R gives me: Random effects: Groups Name Variance Std.Dev. scenario (Intercept) 219 14.80 subject (Intercept) 4015 63.36 Residual 646 25.42 Fixed effects: Estimate Std. Error t value (Intercept) 202.588 26.754 7.572 attitudepol -19.695 5.585 -3.527 I tried

zeroinflatedpoisson model in python

阅读更多关于 zeroinflatedpoisson model in python

问题 I want to use python3 to build a zeroinflatedpoisson model. I found in library statsmodel the function statsmodels.discrete.count_model.ZeroInflatePoisson . I just wonder how to use it. It seems I should do: ZIFP(Y_train,X_train).fit() . But when I wanted to do prediction using X_test . It told me the length of X_test doesn't fit X_train . Or is there another package to fit this model? Here is the code I used: X1 = [random.randint(0,1) for i in range(200)] X2 = [random.randint(1,2) for i in

ImportError: No module named statsmodels

阅读更多关于 ImportError: No module named statsmodels

问题 Hi I downloaded the StatsModels source from http://pypi.python.org/pypi/statsmodels#downloads I then untarred to /usr/local/lib/python2.7/dist-packages and per the documentation at http://statsmodels.sourceforge.net/devel/install.html did this sudo python setup.py install It installed, but when I try to import import statsmodels.api as sm I get the following error Traceback (most recent call last): File "/home/Astrophysics/Histogram_Fast.py", line 6, in <module> import statsmodels.api as sm

Calculating scale/dispersion of Gamma GLM using statsmodels

阅读更多关于 Calculating scale/dispersion of Gamma GLM using statsmodels

问题 I'm having trouble obtaining the dispersion parameter of simulated data using statsmodels' GLM function. import statsmodels.api as sm import matplotlib.pyplot as plt import scipy.stats as stats import numpy as np np.random.seed(1) # Generate data x=np.random.uniform(0, 100,50000) x2 = sm.add_constant(x) a = 0.5 b = 0.2 y_true = 1/(a+(b*x)) # Add error scale = 2 # the scale parameter I'm trying to obtain shape = y_true/scale # given that, for Gamma, mu = scale*shape y = np.random.gamma(shape

Simple logistic regression with Statsmodels: Adding an intercept and visualizing the logistic regression equation

阅读更多关于 Simple logistic regression with Statsmodels: Adding an intercept and visualizing the logistic regression equation

问题 Using Statsmodels, I am trying to generate a simple logistic regression model to predict whether a person smokes or not (Smoke) based on their height (Hgt). I have a feeling that an intercept needs to be included into the logistic regression model but I am not sure how to implement one using the add_constant() function. Also, I am unsure why the error below is generated. This is the dataset, Pulse.CSV: https://drive.google.com/file/d/1FdUK9p4Dub4NXsc-zHrYI-AGEEBkX98V/view?usp=sharing The full

Using categorical variables in statsmodels OLS class

阅读更多关于 Using categorical variables in statsmodels OLS class

问题 I want to use statsmodels OLS class to create a multiple regression model. Consider the following dataset: import statsmodels.api as sm import pandas as pd import numpy as np dict = {'industry': ['mining', 'transportation', 'hospitality', 'finance', 'entertainment'], 'debt_ratio':np.random.randn(5), 'cash_flow':np.random.randn(5) + 90} df = pd.DataFrame.from_dict(dict) x = data[['debt_ratio', 'industry']] y = data['cash_flow'] def reg_sm(x, y): x = np.array(x).T x = sm.add_constant(x) results

吴裕雄数据挖掘与分析案例实战（6）——线性回归预测模型

阅读更多关于吴裕雄数据挖掘与分析案例实战（6）——线性回归预测模型

# 工作年限与收入之间的散点图 # 导入第三方模块 import pandas as pd import seaborn as sns import matplotlib.pyplot as plt # 导入数据集 income = pd.read_csv(r'F:\\python_Data_analysis_and_mining\\07\\Salary_Data.csv') print(income.shape) print(income.head()) # 绘制散点图 sns.lmplot(x = 'YearsExperience', y = 'Salary', data = income, ci = None) # 显示图形 plt.show() # 简单线性回归模型的参数求解 # 样本量 n = income.shape[0] # 计算自变量、因变量、自变量平方、自变量与因变量乘积的和 sum_x = income.YearsExperience.sum() sum_y = income.Salary.sum() sum_x2 = income.YearsExperience.pow(2).sum() xy = income.YearsExperience * income.Salary sum_xy = xy.sum() # 根据公式计算回归模型的参数 b = (sum

数理统计（一）——用Python进行方差分析

阅读更多关于数理统计（一）——用Python进行方差分析

数理统计（一）——Python进行方差分析　　iwehdio的博客园： https://www.cnblogs.com/iwehdio/ 　　方差分析可以用来推断一个或多个因素在其状态变化时，其因素水平或交互作用是否会对实验指标产生显著影响。主要分为单因素方差分析、多因素无重复方差分析和多因素重复方差分析。　　做数理统计课后题，发现方差分析计算比较麻烦，想用Python掉包实现。但是发现大多教程对参数的讲解不是很清楚，在此做记录。　　主要用到的库是pandas和statsmodels。简要流程是，先用pandas库的DataFrame数据结构来构造输入数据格式。然后用statsmodels库中的ols函数得到最小二乘线性回归模型。最后用statsmodels库中的anova_lm函数进行方差分析。　　 import pandas as pd import numpy as np from statsmodels.formula.api import ols from statsmodels.stats.anova import anova_lm 　　首先，是输入的数据格式。使用pandas的DataFrame，每一行为一次试验的因素水平和试验结果。以下图中的题目为例。　　则对于因素A和因素B即结果R可表示为如下的DataFrame： data = pd.DataFrame

用Python学分析

阅读更多关于用Python学分析

单因素方差分析(One-Way Analysis of Variance) 判断控制变量是否对观测变量产生了显著影响分析步骤 1. 建立检验假设　　 - H0：不同因子水平间的均值无差异　　- H1：不同因子水平间的均值有显著差异　　- 【注意】有差异，有可能是所有因子水平间都存在差异，也有可能只有两个因子水平间的均值存在差异 2. 计算检验统计量F值　　F = MSA / MSE 　　MSA = SSA / ( k - 1 ) MSA：组间均方, 对总体方差的一个估计　　MSE = SSE / ( n - k ) MSE：组内均方,不论H0是否为真，MSE都是总体方差的一个无偏估计　　SST = SSA + SSE SST：总误差平方和，反映全部观测值的离散情况 SSA:组间误差平方和，也称水平项误差平方和，反映各因子水平（总体）的样本均值之间的差异程度 SSE: 组内误差平方和 3. 确定P值 4. 方差分析表 5. 根据给定的显著性水平，并作出决策　　根据F值进行假设检验　　根据选定的显著性水平，F值大于临界值时，将拒绝原假设　　根据P值进行假设检验 6. 进一步分析方差齐性检验多重比较检验　　- 确定控制变量的不同水平对观测变量的影响程度　　- 哪个水平的作用明显区别于其他水平　　- 哪个水平的作用是不显著　　- 等等【python分析

python数据分析于实现，单样本体检验、独立样本体检验、相关分析、列联表分析！

阅读更多关于 python数据分析于实现，单样本体检验、独立样本体检验、相关分析、列联表分析！

1、假设检验做出一个假设，去验证。需要设定置信度，如95% 两类错误：两类错误是概率原假设一般为等式。样本量的影响：步骤：假设—置信度—收集数据—计算p值判断 T检验拒绝域和接受域。单样本T检验，没有数据，这个课程没有数据，很遗憾，一会在找数据从新做一遍！两变量男生和女生的月均支出是否有差异？方差是否相等？F检验! 开始，t统计量！数据说明，目的为筛选变量方差分析教育程度对信用卡支出是否有差别？总变异说明：组内变异：组间变异：自己理解：总变异：（单个样本的均值 -总体样本的均值）的平方和　　　　　组内变异：这个组（样本的值- 这个样本所在组的均值）的平方和 + 另一组（样本的值- 这个样本所在这个组的均值）平方和　　　　　组间变异：（每个组的均值-总体的平均值）的平方和　　　　　我能理解，别人能不能理解我不知道，看实例很容易理解！ F统计量要求，这些要达到数据要求，按一列一列的，所以创建这个数据，然后用F_onewasy()函数去实现！后面的值就是p值。这是利用statsmodels去实现也能得到方差分析的结果多因素方差分析 r方做个线性回归就出来：加上交互项两连续变量！相关分析：散点图：看是否线性。是否相关。先大概看看！相关系数介绍，用最多pearson。相关系数的计算相关系数与相关性之间关系相关系数的检验

订阅 statsmodels