Perform 2 sample t-test

前端 未结 2 856
误落风尘
误落风尘 2020-12-23 17:48

I have a the mean, std dev and n of sample 1 and sample 2 - samples are taken from the sample population, but measured by different labs.

n is different for sample 1

相关标签:
2条回答
  • 2020-12-23 18:21

    If you have the original data as arrays a and b, you can use scipy.stats.ttest_ind with the argument equal_var=False:

    t, p = ttest_ind(a, b, equal_var=False)
    

    If you have only the summary statistics of the two data sets, you can calculate the t value using scipy.stats.ttest_ind_from_stats (added to scipy in version 0.16) or from the formula (http://en.wikipedia.org/wiki/Welch%27s_t_test).

    The following script shows the possibilities.

    from __future__ import print_function
    
    import numpy as np
    from scipy.stats import ttest_ind, ttest_ind_from_stats
    from scipy.special import stdtr
    
    np.random.seed(1)
    
    # Create sample data.
    a = np.random.randn(40)
    b = 4*np.random.randn(50)
    
    # Use scipy.stats.ttest_ind.
    t, p = ttest_ind(a, b, equal_var=False)
    print("ttest_ind:            t = %g  p = %g" % (t, p))
    
    # Compute the descriptive statistics of a and b.
    abar = a.mean()
    avar = a.var(ddof=1)
    na = a.size
    adof = na - 1
    
    bbar = b.mean()
    bvar = b.var(ddof=1)
    nb = b.size
    bdof = nb - 1
    
    # Use scipy.stats.ttest_ind_from_stats.
    t2, p2 = ttest_ind_from_stats(abar, np.sqrt(avar), na,
                                  bbar, np.sqrt(bvar), nb,
                                  equal_var=False)
    print("ttest_ind_from_stats: t = %g  p = %g" % (t2, p2))
    
    # Use the formulas directly.
    tf = (abar - bbar) / np.sqrt(avar/na + bvar/nb)
    dof = (avar/na + bvar/nb)**2 / (avar**2/(na**2*adof) + bvar**2/(nb**2*bdof))
    pf = 2*stdtr(dof, -np.abs(tf))
    
    print("formula:              t = %g  p = %g" % (tf, pf))
    

    The output:

    ttest_ind:            t = -1.5827  p = 0.118873
    ttest_ind_from_stats: t = -1.5827  p = 0.118873
    formula:              t = -1.5827  p = 0.118873
    
    0 讨论(0)
  • 2020-12-23 18:21

    Using a recent version of Scipy 0.12.0, this functionality is built in (and does in fact operates on samples of different sizes). In scipy.stats the ttest_ind function performs Welch’s t-test when the flag equal_var is set to False.

    For example:

    >>> import scipy.stats as stats
    >>> sample1 = np.random.randn(10, 1)
    >>> sample2 = 1 + np.random.randn(15, 1)
    >>> t_stat, p_val = stats.ttest_ind(sample1, sample2, equal_var=False)
    >>> t_stat
    array([-3.94339083])
    >>> p_val
    array([ 0.00070813])
    
    0 讨论(0)
提交回复
热议问题