Is there any python package that allows the efficient computation of the PDF (probability density function) of a multivariate normal distribution?
It doesn\'t seem to be
The following code helped me to solve,when given a vector what is the likelihood that vector is in a multivariate normal distribution.
import numpy as np
from scipy.stats import multivariate_normal
d= np.array([[1,2,1],[2,1,3],[4,5,4],[2,2,1]])
mean = sum(d,axis=0)/len(d)
OR
mean=np.average(d , axis=0)
mean.shape
cov = 0
for e in d:
cov += np.dot((e-mean).reshape(len(e), 1), (e-mean).reshape(1, len(e)))
cov /= len(d)
cov.shape
dist = multivariate_normal(mean,cov)
print(dist.pdf([1,2,3]))
3.050863384798471e-05
The above value gives the likelihood.
I just made one for my purposes so I though I'd share. It's built using "the powers" of numpy, on the formula of the non degenerate case from http://en.wikipedia.org/wiki/Multivariate_normal_distribution and it aso validates the input.
Here is the code along with a sample run
from numpy import *
import math
# covariance matrix
sigma = matrix([[2.3, 0, 0, 0],
[0, 1.5, 0, 0],
[0, 0, 1.7, 0],
[0, 0, 0, 2]
])
# mean vector
mu = array([2,3,8,10])
# input
x = array([2.1,3.5,8, 9.5])
def norm_pdf_multivariate(x, mu, sigma):
size = len(x)
if size == len(mu) and (size, size) == sigma.shape:
det = linalg.det(sigma)
if det == 0:
raise NameError("The covariance matrix can't be singular")
norm_const = 1.0/ ( math.pow((2*pi),float(size)/2) * math.pow(det,1.0/2) )
x_mu = matrix(x - mu)
inv = sigma.I
result = math.pow(math.e, -0.5 * (x_mu * inv * x_mu.T))
return norm_const * result
else:
raise NameError("The dimensions of the input don't match")
print norm_pdf_multivariate(x, mu, sigma)
I know of several python packages that use it internally, with different generality and for different uses, but I don't know if any of them are intended for users.
statsmodels, for example, has the following hidden function and class, but it's not used by statsmodels:
https://github.com/statsmodels/statsmodels/blob/master/statsmodels/miscmodels/try_mlecov.py#L36
https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/distributions/mv_normal.py#L777
Essentially, if you need fast evaluation, rewrite it for your use case.
If still needed, my implementation would be
import numpy as np
def pdf_multivariate_gauss(x, mu, cov):
'''
Caculate the multivariate normal density (pdf)
Keyword arguments:
x = numpy array of a "d x 1" sample vector
mu = numpy array of a "d x 1" mean vector
cov = "numpy array of a d x d" covariance matrix
'''
assert(mu.shape[0] > mu.shape[1]), 'mu must be a row vector'
assert(x.shape[0] > x.shape[1]), 'x must be a row vector'
assert(cov.shape[0] == cov.shape[1]), 'covariance matrix must be square'
assert(mu.shape[0] == cov.shape[0]), 'cov_mat and mu_vec must have the same dimensions'
assert(mu.shape[0] == x.shape[0]), 'mu and x must have the same dimensions'
part1 = 1 / ( ((2* np.pi)**(len(mu)/2)) * (np.linalg.det(cov)**(1/2)) )
part2 = (-1/2) * ((x-mu).T.dot(np.linalg.inv(cov))).dot((x-mu))
return float(part1 * np.exp(part2))
def test_gauss_pdf():
x = np.array([[0],[0]])
mu = np.array([[0],[0]])
cov = np.eye(2)
print(pdf_multivariate_gauss(x, mu, cov))
# prints 0.15915494309189535
if __name__ == '__main__':
test_gauss_pdf()
In case I make future changes, the code is here on GitHub