问题
Normally the dtype
is hidden when it's equivalent to the native type:
>>> import numpy as np
>>> np.arange(5)
array([0, 1, 2, 3, 4])
>>> np.arange(5).dtype
dtype('int32')
>>> np.arange(5) + 3
array([3, 4, 5, 6, 7])
But somehow that doesn't apply to floor division or modulo:
>>> np.arange(5) // 3
array([0, 0, 0, 1, 1], dtype=int32)
>>> np.arange(5) % 3
array([0, 1, 2, 0, 1], dtype=int32)
Why is there a difference?
Python 3.5.4, NumPy 1.13.1, Windows 64bit
回答1:
You actually have multiple distinct 32-bit integer dtypes here. This is probably a bug.
NumPy has (accidentally?) created multiple distinct signed 32-bit integer types, probably corresponding to C int
and long
. Both of them display as numpy.int32
, but they're actually different objects. At C level, I believe the type objects are PyIntArrType_Type
and PyLongArrType_Type
, generated here.
dtype objects have a type
attribute corresponding to the type object of scalars of that dtype. It is this type
attribute that NumPy inspects when deciding whether to print dtype
information in an array's repr
:
_typelessdata = [int_, float_, complex_]
if issubclass(intc, int):
_typelessdata.append(intc)
if issubclass(longlong, int):
_typelessdata.append(longlong)
...
def array_repr(arr, max_line_width=None, precision=None, suppress_small=None):
...
skipdtype = (arr.dtype.type in _typelessdata) and arr.size > 0
if skipdtype:
return "%s(%s)" % (class_name, lst)
else:
...
return "%s(%s,%sdtype=%s)" % (class_name, lst, lf, typename)
On numpy.arange(5)
and numpy.arange(5) + 3
, .dtype.type
is numpy.int_
; on numpy.arange(5) // 3
or numpy.arange(5) % 3
, .dtype.type
is the other 32-bit signed integer type.
As for why +
and //
have different output dtypes, they use different type resolution routines. Here's the one for //
, and here's the one for +
. //
's type resolution looks for a ufunc inner loop that takes types the inputs can be safely cast to, while +
's type resolution applies NumPy type promotion to the arguments and picks the loop matching the resulting type.
回答2:
It comes down to a difference in the dtype
, as can be seen from the view
:
In [186]: x = np.arange(10)
In [187]: y = x // 3
In [188]: x
Out[188]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [189]: y
Out[189]: array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3], dtype=int32)
In [190]: x.view(y.dtype)
Out[190]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
In [191]: y.view(x.dtype)
Out[191]: array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3])
Even though the dtype
descr
are the same, there's some attribute that's different. But which?
In [192]: x.dtype.descr
Out[192]: [('', '<i4')]
In [193]: y.dtype.descr
Out[193]: [('', '<i4')]
In [204]: x.dtype.type
Out[204]: numpy.int32
In [205]: y.dtype.type
Out[205]: numpy.int32
In [207]: dtx.type is dty.type
Out[207]: False
In [243]: np.core.numeric._typelessdata
Out[243]: [numpy.int32, numpy.float64, numpy.complex128]
In [245]: x.dtype.type in np.core.numeric._typelessdata
Out[245]: True
In [246]: y.dtype.type in np.core.numeric._typelessdata
Out[246]: False
So y
s dtype.type
by all appearances is the same as x
s, but it's a different object, with a different id
:
In [261]: id(np.int32)
Out[261]: 3045777728
In [262]: id(x.dtype.type)
Out[262]: 3045777728
In [263]: id(y.dtype.type)
Out[263]: 3045777952
In [282]: id(np.intc)
Out[282]: 3045777952
Add this extra type
to the list, and y
no longer shows the dtype:
In [267]: np.core.numeric._typelessdata.append(y.dtype.type)
In [269]: y
Out[269]: array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3])
So y.dtype.type
is np.intc
(and np.intp
), while x.dtype.type
is np.int32
(and np.int_
).
So to make an array that displays the dtype, use np.intc
.
In [23]: np.arange(10,dtype=np.int_)
Out[23]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [24]: np.arange(10,dtype=np.intc)
Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
And to turn this off, append np.intc
to np.core.numeric._typelessdata
.
来源:https://stackoverflow.com/questions/46285518/why-is-the-dtype-shown-even-if-its-the-native-one-when-using-floor-division-w