NumPy: how to quickly normalize many vectors?

前端未结

关注

 6  980

How can a list of vectors be elegantly normalized, in NumPy?

Here is an example that does not work:

from numpy import *

vectors = array([arange


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  梦如初夏        
                
              
                            
                2021-01-31 05:21
              
            
            
                                                                       
My preferred way to normalize vectors is by using numpy's inner1d to calculate their magnitudes. Here's what's been suggested so far compared to inner1d

import numpy as np
from numpy.core.umath_tests import inner1d
COUNT = 10**6 # 1 million points

points = np.random.random_sample((COUNT,3,))
A      = np.sqrt(np.einsum('...i,...i', points, points))
B      = np.apply_along_axis(np.linalg.norm, 1, points)   
C      = np.sqrt((points ** 2).sum(-1))
D      = np.sqrt((points*points).sum(axis=1))
E      = np.sqrt(inner1d(points,points))

print [np.allclose(E,x) for x in [A,B,C,D]] # [True, True, True, True]


Testing  performance with cProfile:

import cProfile
cProfile.run("np.sqrt(np.einsum('...i,...i', points, points))**0.5") # 3 function calls in 0.013 seconds
cProfile.run('np.apply_along_axis(np.linalg.norm, 1, points)')       # 9000018 function calls in 10.977 seconds
cProfile.run('np.sqrt((points ** 2).sum(-1))')                       # 5 function calls in 0.028 seconds
cProfile.run('np.sqrt((points*points).sum(axis=1))')                 # 5 function calls in 0.027 seconds
cProfile.run('np.sqrt(inner1d(points,points))')                      # 2 function calls in 0.009 seconds


inner1d computed the magnitudes a hair faster than einsum. So using inner1d to normalize:

n = points/np.sqrt(inner1d(points,points))[:,None]
cProfile.run('points/np.sqrt(inner1d(points,points))[:,None]') # 2 function calls in 0.026 seconds


Testing against scikit:

import sklearn.preprocessing as preprocessing
n_ = preprocessing.normalize(points, norm='l2')
cProfile.run("preprocessing.normalize(points, norm='l2')") # 47 function calls in 0.047 seconds
np.allclose(n,n_) # True


Conclusion: using inner1d seems to be the best option
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  难免孤独        
                
              
                            
                2021-01-31 05:24
              
            
            
                                                                       
there is already a function in scikit learn:

import sklearn.preprocessing as preprocessing
norm =preprocessing.normalize(m, norm='l2')*


More info at:

http://scikit-learn.org/stable/modules/preprocessing.html
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  被撕碎了的回忆        
                
              
                            
                2021-01-31 05:32
              
            
            
                                                                       
Alright: NumPy's array shape broadcast adds dimensions to the left of the array shape, not to its right. NumPy can however be instructed to add a dimension to the right of the norms array:

print vectors.T / norms[:, newaxis]


does work!
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  走了就别回头了        
                
              
                            
                2021-01-31 05:46
              
            
            
                                                                       
Computing the magnitude

I came across this question and became curious about your method for normalizing. I use a different method to compute the magnitudes. Note: I also typically compute norms across the last index (rows in this case, not columns).

magnitudes = np.sqrt((vectors ** 2).sum(-1))[..., np.newaxis]


Typically, however, I just normalize like so:

vectors /= np.sqrt((vectors ** 2).sum(-1))[..., np.newaxis]


A time comparison

I ran a test to compare the times, and found that my method is faster by quite a bit, but Freddie Witherdon's suggestion is even faster.

import numpy as np    
vectors = np.random.rand(100, 25)

# OP's
%timeit np.apply_along_axis(np.linalg.norm, 1, vectors)
# Output: 100 loops, best of 3: 2.39 ms per loop

# Mine
%timeit np.sqrt((vectors ** 2).sum(-1))[..., np.newaxis]
# Output: 10000 loops, best of 3: 13.8 us per loop

# Freddie's (from comment below)
%timeit np.sqrt(np.einsum('...i,...i', vectors, vectors))
# Output: 10000 loops, best of 3: 6.45 us per loop


Beware though, as this StackOverflow answer notes, there are some safety checks not happening with einsum, so you should be sure that the dtype of vectors is sufficient to store the square of the magnitudes accurately enough.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  温柔的废话        
                
              
                            
                2021-01-31 05:47
              
            
            
                                                                       
Well, unless I missed something, this does work:

vectors / norms


The problem in your suggestion is the broadcasting rules.

vectors  # shape 2, 10
norms  # shape 10


The shape do not have the same length! So the rule is to first extend the small shape by one on the left:

norms  # shape 1,10


You can do that manually by calling:

vectors / norms.reshape(1,-1)  # same as vectors/norms


If you wanted to compute vectors.T/norms, you would have to do the reshaping manually, as follows:

vectors.T / norms.reshape(-1,1)  # this works

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  栀梦        
                
              
                            
                2021-01-31 05:48
              
            
            
                                                                       
For the two-dimensional case, using np.hypot(vectors[:,0],vectors[:,1]) looks to be faster than Freddie Witherden's np.sqrt(np.einsum('...i,...i', vectors, vectors)) for calculating the magnitudes. (Referencing answer by Geoff)

import numpy as np

# Generate array of 2D vectors.
vectors = np.random.random((1000,2))

# Using Freddie's
%timeit np.sqrt(np.einsum('...i,...i', vectors, vectors))
# Output: 11.1 µs ± 173 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

# Using numpy.hypot()
%timeit np.hypot(vectors[:,0], vectors[:,1])
# Output: 6.81 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


To get the normalised vectors then do:

vectors /= np.hypot(vectors[:,0], vectors[:,1])

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复