Why is numpy.array() is sometimes very slow?

后端未结

关注

 2  1522

余生分开走

I\'m using the numpy.array() function to create numpy.float64 ndarrays from lists.

I noticed that this is very slow when either the list contains None or a list of l

相关标签:

2条回答

忘掉有多难

2020-12-31 06:29

I've reported this as a numpy issue. The report and patch files are here:

https://github.com/numpy/numpy/issues/3392

After patching:

# was 240 ms, best alternate version was 3.29
In [5]: %timeit numpy.array([None]*100000)
100 loops, best of 3: 7.49 ms per loop

# was 353 ms, best alternate version was 9.65
In [6]: %timeit numpy.array([[0.0]]*100000)
10 loops, best of 3: 23.7 ms per loop

0 讨论(0)

余生分开走

2020-12-31 06:38

My guess would be that the code for converting lists just calls float on everything. If the argument defines __float__, we call that, otherwise we treat it like a string (throwing an exception on None, we catch that and puts in np.nan). The exception handling should be relatively slower.

Timing seems to verify this hypothesis:

import numpy as np
%timeit [None] * 100000
> 1000 loops, best of 3: 1.04 ms per loop

%timeit np.array([0.0] * 100000)
> 10 loops, best of 3: 21.3 ms per loop
%timeit [i.__float__() for i in [0.0] * 100000]
> 10 loops, best of 3: 32 ms per loop


def flt(d):
    try:
        return float(d)
    except:
        return np.nan

%timeit np.array([None] * 100000, dtype=np.float64)
> 1 loops, best of 3: 477 ms per loop    
%timeit [flt(d) for d in [None] * 100000]
> 1 loops, best of 3: 328 ms per loop

Adding another case just to be obvious about where I'm going with this. If there was an explicit check for None, it would not be this slow above:

def flt2(d):                              
    if d is None:
        return np.nan
    try:
        return float(d)
    except:
        return np.nan

%timeit [flt2(d) for d in [None] * 100000]
> 10 loops, best of 3: 45 ms per loop

0 讨论(0)