How to apply Henze-Zirkler's Multivariate Normality Test in Jupyter notebook with rpy2

白昼怎懂夜的黑 提交于 2019-12-02 10:37:21

This is another answer since I discover this method later. If you do not want to import the library of R into Python. One may call the output of R to python. i.e. one is capable of activating R function through python as follow:

import rpy2.robjects as robjects
from rpy2.robjects import r
from rpy2.robjects.numpy2ri import numpy2ri
from rpy2.robjects.packages import importr
import numpy as np

suppose that resi is a Dataframe in python say

# Create data
resi = pd.DataFrame(np.random.random((108, 2)), columns=['Number1','Number2'])

Then the code is as follow

#Converting the dataframe from python to R

# firt take the values of the dataframe to numpy
resi1=np.array(resi, dtype=float)

# Taking the variable from Python to R
r_resi = numpy2ri(resi1)

# Creating this variable in R (from python)
r.assign("resi", r_resi)

# Calling libraries in R 
r('library("MVN")')

# Calling a function in R (from python)
r("res <- hzTest(resi, qqplot = F)")

# Retrieving information from R to Python
r_result = r("res")

# Printing the output in python
print(r_result)

This will generate the output:

 Henze-Zirkler's Multivariate Normality Test 

--------------------------------------------- 

  data : resi 



  HZ      : 2.841424 

  p-value : 1.032563e-06 



  Result  : Data are not multivariate normal. 

---------------------------------------------

There is a package in R that already does this test and it is called MVN

The first thing you have to do is to import MVN into python as described in here

Then go to your jupyter notebook and fit the VAR(1) model to your data as so

# Fit VAR(1) Model

results = Model.fit(1)
results.summary()

Store the residuals as resi

resi=results.resid

Then

# Call function from R
import os
os.environ['R_USER'] = '...\Lib\site-packages\rpy2'
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
pandas2ri.activate()

from rpy2.robjects.packages import importr

MVN = importr("MVN", lib_loc = "C:/.../R/win-library/3.3")

After importing MVN you can simply do the normality test as so

MVNresult =MVN.hzTest(resi, qqplot = 0)

If you press on

type(MVNresult)

you will find that it is an

rpy2.robjects.methods.RS4

Therefore, in this case you will find this link a very powerful in explaining the details

Then afterwards

tuple(MVNresult.slotnames())

This will show you the observations

('HZ', 'p.value', 'dname', 'dataframe')

Then you may get the values as so

np.array(MVNresult.slots[tuple(MVNresult.slotnames())[i]])[0]

where i stands for 0, 1, 2, 3 as 'HZ', 'p-value',...

So in case the p-value i.e. i=1 is less than 0.05 then residuals (resi) are not multivariate normal at 5% confidence level.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!