Is there any python package that allows the efficient computation of the PDF (probability density function) of a multivariate normal distribution?
It doesn\'t seem to be
The multivariate normal is now available on SciPy 0.14.0.dev-16fc0af
:
from scipy.stats import multivariate_normal
var = multivariate_normal(mean=[0,0], cov=[[1,0],[0,1]])
var.pdf([1,0])
Here I elaborate a bit more on how exactly to use the multivariate_normal() from the scipy package:
# Import packages
import numpy as np
from scipy.stats import multivariate_normal
# Prepare your data
x = np.linspace(-10, 10, 500)
y = np.linspace(-10, 10, 500)
X, Y = np.meshgrid(x,y)
# Get the multivariate normal distribution
mu_x = np.mean(x)
sigma_x = np.std(x)
mu_y = np.mean(y)
sigma_y = np.std(y)
rv = multivariate_normal([mu_x, mu_y], [[sigma_x, 0], [0, sigma_y]])
# Get the probability density
pos = np.empty(X.shape + (2,))
pos[:, :, 0] = X
pos[:, :, 1] = Y
pd = rv.pdf(pos)
I use the following code which calculates the logpdf value, which is preferable for larger dimensions. It also works for scipy.sparse matrices.
import numpy as np
import math
import scipy.sparse as sp
import scipy.sparse.linalg as spln
def lognormpdf(x,mu,S):
""" Calculate gaussian probability density of x, when x ~ N(mu,sigma) """
nx = len(S)
norm_coeff = nx*math.log(2*math.pi)+np.linalg.slogdet(S)[1]
err = x-mu
if (sp.issparse(S)):
numerator = spln.spsolve(S, err).T.dot(err)
else:
numerator = np.linalg.solve(S, err).T.dot(err)
return -0.5*(norm_coeff+numerator)
Code is from pyParticleEst, if you want the pdf value instead of the logpdf just take math.exp() on the returned value
You can easily compute using numpy. I have implemented as below for the purpose of machine learning course and would like to share, hope it helps to someone.
import numpy as np
X = np.array([[13.04681517, 14.74115241],[13.40852019, 13.7632696 ],[14.19591481, 15.85318113],[14.91470077, 16.17425987]])
def est_gaus_par(X):
mu = np.mean(X,axis=0)
sig = np.std(X,axis=0)
return mu,sig
mu,sigma = est_gaus_par(X)
def est_mult_gaus(X,mu,sigma):
m = len(mu)
sigma2 = np.diag(sigma)
X = X-mu.T
p = 1/((2*np.pi)**(m/2)*np.linalg.det(sigma2)**(0.5))*np.exp(-0.5*np.sum(X.dot(np.linalg.pinv(sigma2))*X,axis=1))
return p
p = est_mult_gaus(X, mu, sigma)
In the common case of a diagonal covariance matrix, the multivariate PDF can be obtained by simply multiplying the univariate PDF values returned by a scipy.stats.norm
instance. If you need the general case, you will probably have to code this yourself (which shouldn't be hard).
The density can be computed in a pretty straightforward way using numpy functions and the formula on this page: http://en.wikipedia.org/wiki/Multivariate_normal_distribution. You may also want to use the likelihood function (log probability), which is less likely to underflow for large dimensions and is a little more straightforward to compute. Both just involve being able to compute the determinant and inverse of a matrix.
The CDF, on the other hand, is an entirely different animal...