问题
I have a 2D numpy array "signals" of shape (100000, 1024). Each row contains the traces of amplitude of a signal, which I want to normalise to be within 0-1.
The signals each have different amplitudes, so I can't just divide by one common factor, so I was wondering if there's a way to normalise each of the signals so that each value within them is between 0-1?
Let's say that the signals look something like [[0,1,2,3,5,8,2,1],[0,2,5,10,7,4,2,1]] and I want them to become [[0.125,0.25,0.375,0.625,1,0.25,0.125],[0,0.2,0.5,0.7,0.4,0.2,0.1]].
Is there a way to do it without looping over all 100,000 signals, as this will surely be slow?
Thanks!
回答1:
Easy thing to do would be to generate a new numpy array with max values by axis and divide by it:
import numpy as np
a = np.array([[0,1,2,3,5,8,2,1],[0,2,5,10,7,4,2,1]])
b = np.max(a, axis = 1)
print(a / b[:,np.newaxis])
output:
[[0. 0.125 0.25 0.375 0.625 1. 0.25 0.125]
[0. 0.2 0.5 1. 0.7 0.4 0.2 0.1 ]]
回答2:
Adding a little benchmark to show just how significant is the performance difference between the two solutions:
import numpy as np
import timeit
arr = np.arange(1024).reshape(128,8)
def using_list_comp():
return np.array([s/np.max(s) for s in arr])
def using_vectorized_max_div():
return arr/arr.max(axis=1)[:, np.newaxis]
result1 = using_list_comp()
result2 = using_vectorized_max_div()
print("Results equal:", (result1==result2).all())
time1 = timeit.timeit('using_list_comp()', globals=globals(), number=1000)
time2 = timeit.timeit('using_vectorized_max_div()', globals=globals(), number=1000)
print(time1)
print(time2)
print(time1/time2)
On my machine the output is:
Results equal: True
0.9873569
0.010177099999999939
97.01750989967731
Almost a 100x difference!
回答3:
Another solution is to use normalize
:
from sklearn.preprocessing import normalize
data = [[0,1,2,3,5,8,2,1],[0,2,5,10,7,4,2,1]]
normalize(data, axis=1, norm='max')
result:
array([[0. , 0.125, 0.25 , 0.375, 0.625, 1. , 0.25 , 0.125],
[0. , 0.2 , 0.5 , 1. , 0.7 , 0.4 , 0.2 , 0.1 ]])
Please note norm='max'
argument. Default value is 'l2'.
来源:https://stackoverflow.com/questions/62793045/dynamically-normalise-2d-numpy-array