Covariance Matrix Python - Omit -9999 Value

问题

I'm trying to calculate the co-variance matrix of two completely overlapping images using python. The code for the same is:

stacked = np.vstack((image1.ravel(),image2.ravel()))
np.cov(stacked)

The issue with using this method is that sometimes the images may contain a NoData value like -9999 signifying that the pixel value isn't present. In such a case the np.cov still considers the value causing the mean of the images to drastically vary giving the wrong covariance output.
If I try to remove the NoData there comes the issue of dimensionality wherein both the images don't have the same dimensions and hence the covariance matrix cannot be computed.
Manual computation would be highly time consuming

Is there a value to overcome the issue of NoData and calculate the covariance matrix correctly?

回答1:

Your best option would be to use the methods provided with numpy's masked arrays, one of which is that of computing the covariance matrix when masked items are present:

>>> import numpy as np
>>> mask_value = -9999
>>> a = np.array([1, 2, mask_value, 4])
>>> b = np.array([1, mask_value, 3, 4])
>>> c = np.vstack((a,b))
>>> 
>>> masked_a, masked_b, masked_c = [np.ma.array(x, mask=x==mask_value) for x in (a,b,c)]  # note: testing for equality is a bad idea if you're working with floats. I'm not, these are integers, so it's okay.
>>> 
>>> result = np.ma.cov(masked_c)
>>> result
masked_array(data =
 [[2.333333333333333 4.444444444444445]
 [4.444444444444445 2.333333333333333]],
             mask =
 [[False False]
 [False False]],
       fill_value = 1e+20)

>>> np.cov([1,2,4]) # autocovariance when just one element is masked is the same as the previous result[0,0]
array(2.333333333333333)

The results are different depending on how you call np.ma.cov:

>>> np.ma.cov(masked_a, masked_b)
masked_array(data =
 [[4.5 4.5]
 [4.5 4.5]],
             mask =
 [[False False]
 [False False]],
       fill_value = 1e+20)

>>> np.cov([1,4])  # result of the autocovariance when 2 of the 4 values are masked
array(4.5)

The reason for that is that the latter approach combines the masks for the 2 variables like this:

>>> mask2 = masked_c.mask.any(axis=0)
>>> all_masked_c = np.ma.array(c, mask=np.vstack((mask2, mask2)))
>>> all_masked_c
masked_array(data =
 [[1 -- -- 4]
 [1 -- -- 4]],
             mask =
 [[False  True  True False]
 [False  True  True False]],
       fill_value = 999999)

>>> np.ma.cov(all_masked_c) # same function call as the first approach, but with a different mask!
masked_array(data =
 [[4.5 4.5]
 [4.5 4.5]],
             mask =
 [[False False]
 [False False]],
       fill_value = 1e+20)

So use np.ma.cov but take note of how you want the data to be interpreted when there are non-overlapping masked values present.

来源：https://stackoverflow.com/questions/29456962/covariance-matrix-python-omit-9999-value

标签

python

image

numpy

matrix

no-data