Covariance Matrix Python - Omit -9999 Value

 ̄綄美尐妖づ 提交于 2020-01-25 11:02:25

问题


I'm trying to calculate the co-variance matrix of two completely overlapping images using python. The code for the same is:

stacked = np.vstack((image1.ravel(),image2.ravel()))
np.cov(stacked)
  • The issue with using this method is that sometimes the images may contain a NoData value like -9999 signifying that the pixel value isn't present. In such a case the np.cov still considers the value causing the mean of the images to drastically vary giving the wrong covariance output.

  • If I try to remove the NoData there comes the issue of dimensionality wherein both the images don't have the same dimensions and hence the covariance matrix cannot be computed.

  • Manual computation would be highly time consuming

Is there a value to overcome the issue of NoData and calculate the covariance matrix correctly?


回答1:


Your best option would be to use the methods provided with numpy's masked arrays, one of which is that of computing the covariance matrix when masked items are present:

>>> import numpy as np
>>> mask_value = -9999
>>> a = np.array([1, 2, mask_value, 4])
>>> b = np.array([1, mask_value, 3, 4])
>>> c = np.vstack((a,b))
>>> 
>>> masked_a, masked_b, masked_c = [np.ma.array(x, mask=x==mask_value) for x in (a,b,c)]  # note: testing for equality is a bad idea if you're working with floats. I'm not, these are integers, so it's okay.
>>> 
>>> result = np.ma.cov(masked_c)
>>> result
masked_array(data =
 [[2.333333333333333 4.444444444444445]
 [4.444444444444445 2.333333333333333]],
             mask =
 [[False False]
 [False False]],
       fill_value = 1e+20)

>>> np.cov([1,2,4]) # autocovariance when just one element is masked is the same as the previous result[0,0]
array(2.333333333333333)

The results are different depending on how you call np.ma.cov:

>>> np.ma.cov(masked_a, masked_b)
masked_array(data =
 [[4.5 4.5]
 [4.5 4.5]],
             mask =
 [[False False]
 [False False]],
       fill_value = 1e+20)

>>> np.cov([1,4])  # result of the autocovariance when 2 of the 4 values are masked
array(4.5)

The reason for that is that the latter approach combines the masks for the 2 variables like this:

>>> mask2 = masked_c.mask.any(axis=0)
>>> all_masked_c = np.ma.array(c, mask=np.vstack((mask2, mask2)))
>>> all_masked_c
masked_array(data =
 [[1 -- -- 4]
 [1 -- -- 4]],
             mask =
 [[False  True  True False]
 [False  True  True False]],
       fill_value = 999999)

>>> np.ma.cov(all_masked_c) # same function call as the first approach, but with a different mask!
masked_array(data =
 [[4.5 4.5]
 [4.5 4.5]],
             mask =
 [[False False]
 [False False]],
       fill_value = 1e+20)

So use np.ma.cov but take note of how you want the data to be interpreted when there are non-overlapping masked values present.



来源:https://stackoverflow.com/questions/29456962/covariance-matrix-python-omit-9999-value

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!