问题
I am trying to use the dgof module from R, in Python 3 via rpy2
.
I use it inside python as so:
# import rpy2's package module
import rpy2.robjects.packages as rpackages
# Import R's utility package
utils = rpackages.importr('utils')
# Select a mirror for R packages
utils.chooseCRANmirror(ind=1) # select the first mirror in the list
# R vector of strings
from rpy2.robjects.vectors import StrVector
# Install R package name: 'dgof' (discrete goodness-of-fit) is what we're interested in
if rpackages.isinstalled('dgof') is False:
utils.install_packages(StrVector('dgof'))
# Import dgof
dgof = rpackages.importr('dgof')
Works a charm (i.e. I can import it, which is a big win in itself). Now as a test I wanted to reproduce the example result here, from the API documentation.
For clarity, in pure R, the example is (and to be clear, this function is NOT stats::ks.test(rep(1, 3), ecdf(1:3))
but native dgof
):
ks.test(rep(1, 3), ecdf(1:3))
which results in a p-value of 0.07407
(to verify this, click on the green "Run this code" button in this link). Note that:
> ecdf(1:3)
Empirical CDF
Call: ecdf(1:3)
x[1:3] = 1, 2, 3
> rep(1,3)
[1] 1 1 1
In Python the reproduced example is:
import numpy as np
a = np.array([1,1,1])
b = np.arange(1,4)
dgof.ks_test(a,b)
But in the example, the p-value I find is 0.517551
. The KS-statistic itself is correctly calculated. But why is the simulated p-value different? Again to see the output of the dgof
example in the link, press Run this example
and you'll see the numbers that I am referring to (reproduced above).
来源:https://stackoverflow.com/questions/54083887/discrete-kolmogorov-smirnov-testing-getting-wrong-value-when-using-rpy2-compare