NumPy: how to quickly normalize many vectors?

前端 未结 6 976
梦毁少年i
梦毁少年i 2021-01-31 04:47

How can a list of vectors be elegantly normalized, in NumPy?

Here is an example that does not work:

from numpy import *

vectors = array([arange         


        
6条回答
  •  梦如初夏
    2021-01-31 05:21

    My preferred way to normalize vectors is by using numpy's inner1d to calculate their magnitudes. Here's what's been suggested so far compared to inner1d

    import numpy as np
    from numpy.core.umath_tests import inner1d
    COUNT = 10**6 # 1 million points
    
    points = np.random.random_sample((COUNT,3,))
    A      = np.sqrt(np.einsum('...i,...i', points, points))
    B      = np.apply_along_axis(np.linalg.norm, 1, points)   
    C      = np.sqrt((points ** 2).sum(-1))
    D      = np.sqrt((points*points).sum(axis=1))
    E      = np.sqrt(inner1d(points,points))
    
    print [np.allclose(E,x) for x in [A,B,C,D]] # [True, True, True, True]
    

    Testing performance with cProfile:

    import cProfile
    cProfile.run("np.sqrt(np.einsum('...i,...i', points, points))**0.5") # 3 function calls in 0.013 seconds
    cProfile.run('np.apply_along_axis(np.linalg.norm, 1, points)')       # 9000018 function calls in 10.977 seconds
    cProfile.run('np.sqrt((points ** 2).sum(-1))')                       # 5 function calls in 0.028 seconds
    cProfile.run('np.sqrt((points*points).sum(axis=1))')                 # 5 function calls in 0.027 seconds
    cProfile.run('np.sqrt(inner1d(points,points))')                      # 2 function calls in 0.009 seconds
    

    inner1d computed the magnitudes a hair faster than einsum. So using inner1d to normalize:

    n = points/np.sqrt(inner1d(points,points))[:,None]
    cProfile.run('points/np.sqrt(inner1d(points,points))[:,None]') # 2 function calls in 0.026 seconds
    

    Testing against scikit:

    import sklearn.preprocessing as preprocessing
    n_ = preprocessing.normalize(points, norm='l2')
    cProfile.run("preprocessing.normalize(points, norm='l2')") # 47 function calls in 0.047 seconds
    np.allclose(n,n_) # True
    

    Conclusion: using inner1d seems to be the best option

提交回复
热议问题