问题
I'm working on a program to investigate the correlation between magnitude and redshift for some quasars, and I'm using statsmodels
and scipy.stats.linregress
to compute the statistics of the data; statsmodels
to compute r-squared
(among other parameters), and stats.linregress
to compute r
(among others).
Some example output is:
W1 r-squared: 0.855715
W1 r-value : 0.414026
W2 r-squared: 0.861169
W2 r-value : 0.517381
W3 r-squared: 0.874051
W3 r-value : 0.418523
W4 r-squared: 0.856747
W4 r-value : 0.294094
Visual minus WISE r-squared: 0.87366
Visual minus WISE r-value : -0.521463
My question is, why do the r
and r-squared
values not match
(i.e. for the W1 band, 0.414026**2 != 0.855715)?
The code for my computation function is as follows:
def computeStats(x, y, yName):
from scipy import stats
import statsmodels.api as sm
# Compute model parameters
model = sm.OLS(y, x, missing= 'drop')
results = model.fit()
# Mask NaN values in both axes
mask = ~np.isnan(y) & ~np.isnan(x)
# Compute fit parameters
params = stats.linregress(x[mask], y[mask])
fit = params[0]*x + params[1]
fitEquation = '$(%s)=(%.4g \pm %.4g) \\times redshift+%.4g$'%(yName,
params[0], # slope
params[4], # stderr in slope
params[1]) # y-intercept
print('%s r-squared: %g'%(name, arrayresults.rsquared))
print('%s r-value : %g'%(name, arrayparams[2]))
return results, params, fit, fitEquation
Am I interpreting the statistics incorrectly? Or do the two modules compute the regressions using different methods?
回答1:
By default, OLS
in statsmodels
does not include the constant term (i.e. the intercept) in the linear equation. (The constant term corresponds to a column of ones in the design matrix.)
To match linregress
, create model
like this:
model = sm.OLS(y, sm.add_constant(x), missing= 'drop')
来源:https://stackoverflow.com/questions/51738734/r-in-stats-linregress-compared-to-r-squared-in-statsmodels