问题
I know about basic data types and that float types (float,double) can not hold some numbers exactly.
In porting some code from Matlab to Python (Numpy) I however found some significant differences in calculations, and I think it's going back to precision.
Take the following code, z-normalizing a 500 dimensional vector with only first two elements having a non-zero value.
Matlab:
Z = repmat(0,500,1); Z(1)=3;Z(2)=1;
Za = (Z-repmat(mean(Z),500,1)) ./ repmat(std(Z),500,1);
Za(1)
>>> 21.1694
Python:
from numpy import zeros,mean,std
Z = zeros((500,))
Z[0] = 3
Z[1] = 1
Za = (Z - mean(Z)) / std(Z)
print Za[0]
>>> 21.1905669677
Besides that the formatting shows a bit more digits in Python, there is a huge difference (imho), more than 0.02
Both Python and Matlab are using a 64 bit data type (afaik). Python uses 'numpy.float64' and Matlab 'double'.
Why is the difference so huge? Which one is more correct?
回答1:
Maybe the difference comes from the mean
and std
calls. Compare those first.
There are several definitions for std
, some use the sqaure root of
1 / n * sum((xi - mean(x)) ** 2)
others use
1 / (n - 1) * sum((xi - mean(x)) ** 2)
instead.
From a mathematical point: these formulas are estimators of the variance of a normal distributed random variable. The distribution has two parameters sigma
and mu
. If you know mu
exactly the optimal estimator for sigma ** 2
is
1 / n * sum((xi - mu) ** 2)
If you have to estimate mu
from the data using mu = mean(xi)
, the optimal estimator for sigma**2
is
1 / (n - 1) * sum((xi- mean(x))**2)
回答2:
To answer your question, no, this is not a problem of precision. As @rocksportrocker points out, there are two popular estimators for the standard deviation. MATLAB's std has both available but as a standard uses a different one from what you used in Python.
Try std(Z,1)
instead of std(Z)
:
Za = (Z-repmat(mean(Z),500,1)) ./ repmat(std(Z,2),500,1);Za(1)
sprintf('%1.10f', Za(1))
leads to
Za(1) = 21.1905669677
in MATLAB. Read rockspotrocker's answer about which of the two results is more appropriate for what you want to do ;-).
回答3:
According to the documentation of std
at SciPy, it has a parameter called ddof
:
ddof : int, optional
Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero.
In numpy, ddof
is zero by default while in MATLAB is one. So, I think this may solve the problem:
std(Z,ddof=1)
来源:https://stackoverflow.com/questions/7482205/precision-why-do-matlab-and-python-numpy-give-so-different-outputs