numpy covariance matrix

前端 未结 10 1613
半阙折子戏
半阙折子戏 2021-01-01 13:10

Suppose I have two vectors of length 25, and I want to compute their covariance matrix. I try doing this with numpy.cov, but always end up with a 2x2 matrix.



        
相关标签:
10条回答
  • 2021-01-01 13:31

    You have two vectors, not 25. The computer I'm on doesn't have python so I can't test this, but try:

    z = zip(x,y)
    np.cov(z)
    

    Of course.... really what you want is probably more like:

    n=100 # number of points in each vector
    num_vects=25
    vals=[]
    for _ in range(num_vects):
        vals.append(np.random.normal(size=n))
    np.cov(vals)
    

    This takes the covariance (I think/hope) of num_vects 1xn vectors

    0 讨论(0)
  • 2021-01-01 13:32

    You should change

    np.cov(x,y, rowvar=0)
    

    onto

    np.cov((x,y), rowvar=0)
    
    0 讨论(0)
  • 2021-01-01 13:32

     Covariance matrix from samples vectors

    To clarify the small confusion regarding what is a covariance matrix defined using two N-dimensional vectors, there are two possibilities.

    The question you have to ask yourself is whether you consider:

    • each vector as N realizations/samples of one single variable (for example two 3-dimensional vectors [X1,X2,X3] and [Y1,Y2,Y3], where you have 3 realizations for the variables X and Y respectively)
    • each vector as 1 realization for N variables (for example two 3-dimensional vectors [X1,Y1,Z1] and [X2,Y2,Z2], where you have 1 realization for the variables X,Y and Z per vector)

    Since a covariance matrix is intuitively defined as a variance based on two different variables:

    • in the first case, you have 2 variables, N example values for each, so you end up with a 2x2 matrix where the covariances are computed thanks to N samples per variable
    • in the second case, you have N variables, 2 samples for each, so you end up with a NxN matrix

    About the actual question, using numpy

    if you consider that you have 25 variables per vector (took 3 instead of 25 to simplify example code), so one realization for several variables in one vector, use rowvar=0

    # [X1,Y1,Z1]
    X_realization1 = [1,2,3]
    
    # [X2,Y2,Z2]
    X_realization2 = [2,1,8]
    
    numpy.cov([X,Y],rowvar=0) # rowvar false, each column is a variable
    

    Code returns, considering 3 variables:

    array([[ 0.5, -0.5,  2.5],
           [-0.5,  0.5, -2.5],
           [ 2.5, -2.5, 12.5]])
    

    otherwise, if you consider that one vector is 25 samples for one variable, use rowvar=1 (numpy's default parameter)

    # [X1,X2,X3]
    X = [1,2,3]
    
    # [Y1,Y2,Y3]
    Y = [2,1,8]
    
    numpy.cov([X,Y],rowvar=1) # rowvar true (default), each row is a variable
    

    Code returns, considering 2 variables:

    array([[ 1.        ,  3.        ],
           [ 3.        , 14.33333333]])
    
    0 讨论(0)
  • 2021-01-01 13:34

    Reading the documentation as,

    >> np.cov.__doc__ 
    

    or looking at Numpy Covariance, Numpy treats each row of array as a separate variable, so you have two variables and hence you get a 2 x 2 covariance matrix.

    I think the previous post has right solution. I have the explanation :-)

    0 讨论(0)
  • 2021-01-01 13:35

    I suppose what youre looking for is actually a covariance function which is a timelag function. I'm doing autocovariance like that:

     def autocovariance(Xi, N, k):
        Xs=np.average(Xi)
        aCov = 0.0
        for i in np.arange(0, N-k):
            aCov = (Xi[(i+k)]-Xs)*(Xi[i]-Xs)+aCov
        return  (1./(N))*aCov
    
    autocov[i]=(autocovariance(My_wector, N, h))
    
    0 讨论(0)
  • 2021-01-01 13:41

    i don't think you understand the definition of covariance matrix. If you need 25 x 25 covariance matrix, you need 25 vectors each with n data points.

    0 讨论(0)
提交回复
热议问题