Is there a significant overhead in calling `np.asarray' on a NumPy array?

问题

I am quite new to the Python world, so please excuse my dumb question.

In a number of circumstances, I implement a functions that works on array-like numerical inputs, and it is usually advantageous to make use of NumPy utilities for basic operations on the sequences. To this end, I would write something like this:

import numpy as np

def f(x):
    if not isinstance(x, np.ndarray):
        x = np.asarray(x)
    # and from now on we know that x is a NumPy array, with all standard methods

(Note that I don't want to rely on the caller to always pass NumPy arrays.)

I was wondering what would be the additional overhead if simplified the code by removing the if? I.e., having something like

def f(x):
    x = np.asarray(x)
    # and from now on we know that x is a NumPy array, with all standard methods

Basically, the difference between two cases is that second code is more compact, but will unnecessarily call np.asarray even if x is already a NumPy array.

回答1:

Short answer: Since you're checking with isinstance(), you may use numpy.asanyarray() which will pass through any ndarray and its subclasses without overhead.

According to the docs for numpy.asarray(), when the input is already an ndarray type, there is no overhead when the input is already an array: no copying happens, they "pass through". Although, it's worth noting that a subclass of ndarray does not pass through.

Since in your original code you're using isisntance(x, ndarray), you most likely will want numpy.asanyarray() which passes though the subclasses of ndarray also, which would be more efficient for your use case. (Because isinstance() returns true for subclasses as well)

Returns: out : ndarray Array interpretation of a. No copy is performed if the input is already an ndarray with matching dtype and order. If a is a subclass of ndarray, a base class ndarray is returned.

This example from the docs (plus my own comments) explains the differences and why asanyarray() is better for your use case:

>>> issubclass(np.recarray, np.ndarray)
True   # This is to show that recarray is a subclass of ndarray
>>> a = np.array([(1.0, 2), (3.0, 4)], dtype='f4,i4').view(np.recarray)
>>> np.asarray(a) is a
False  # Here a copy happens which is an overhead you don't want,
       # because the input type recarry is only a subclass of ndarray
>>> np.asanyarray(a) is a
True   # Here no copying happens, your subclass of ndarray passes through.

回答2:

Looking at the code, np.asarray does:

array(a, dtype, copy=False, order=order)

np.asanyarray does

array(a, dtype, copy=False, order=order, subok=True)

defaults for np.array are:

array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)

来源：https://stackoverflow.com/questions/56805126/is-there-a-significant-overhead-in-calling-np-asarray-on-a-numpy-array

标签

python

arrays

numpy