i have a 1000 * 1000 numpy array with 1 million values which was created as follows :
>>import numpy as np
>>data = np.loadtxt(\'space_data.txt\
An un-vectorized linear approach will be to use a dictionary here:
dct = dict(keys)
# new array is required if dtype is different or it it cannot be casted
new_array = np.empty(data.shape, dtype=str)
for index in np.arange(data.size):
index = np.unravel_index(index, data.shape)
new_array[index] = dct[data[index]]
In Python dicts are a natural choice for mapping from keys to values. NumPy has no direct equivalent of a dict. But it does have arrays which can do fast integer indexing. For example,
In [153]: keyarray = np.array(['S','M','L','XL'])
In [158]: data = np.array([[0,2,1], [1,3,2]])
In [159]: keyarray[data]
Out[159]:
array([['S', 'L', 'M'],
['M', 'XL', 'L']],
dtype='|S2')
So if we could massage your key
array into one that looked like this:
In [161]: keyarray
Out[161]:
array(['', '', '', '', '', '', '', '', '', '', 'S', 'S', 'S', 'M', 'L',
'S', 'S', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
'', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
'', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
'', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
'', '', '', '', '', '', '', '', '', '', 'XL', 'M', 'XL', 'S'],
dtype='|S32')
So that 10 maps to 'S' in the sense that keyarray[10]
equals S
, and so forth:
In [162]: keyarray[10]
Out[162]: 'S'
then we could produce the desired result with keyarray[data]
.
import numpy as np
data = np.array( [[ 13., 15., 15., 15., 15., 16.],
[ 14., 13., 14., 13., 15., 16.],
[ 16., 13., 13., 13., 15., 17.],
[ 14., 15., 14., 14., 14., 13.],
[ 15., 15 , 16., 16., 15., 14.],
[ 14., 13., 16., 16., 16., 16.]])
key = np.array([[ 10., 'S'],
[ 11., 'S'],
[ 12., 'S'],
[ 13., 'M'],
[ 14., 'L'],
[ 15., 'S'],
[ 16., 'S'],
[ 17., 'XL'],
[ 92., 'XL'],
[ 93., 'M'],
[ 94., 'XL'],
[ 95., 'S']])
idx = np.array(key[:,0], dtype=float).astype(int)
n = idx.max()+1
keyarray = np.empty(n, dtype=key[:,1].dtype)
keyarray[:] = ''
keyarray[idx] = key[:,1]
data = data.astype('int')
print(keyarray[data])
yields
[['M' 'S' 'S' 'S' 'S' 'S']
['L' 'M' 'L' 'M' 'S' 'S']
['S' 'M' 'M' 'M' 'S' 'XL']
['L' 'S' 'L' 'L' 'L' 'M']
['S' 'S' 'S' 'S' 'S' 'L']
['L' 'M' 'S' 'S' 'S' 'S']]
Note that data = data.astype('int')
is assuming that the floats in data
can be uniquely mapped to int
s. That appears to be the case with your data, but it is not true for arbitrary floats. For example, astype('int')
maps both 1.0 and 1.5 map to 1.
In [167]: np.array([1.0, 1.5]).astype('int')
Out[167]: array([1, 1])
import numpy as np
data = np.array([[ 13., 15., 15.],
[ 14., 13., 14. ],
[ 16., 13., 13. ]])
key = [[ 10., 'S'],
[ 11., 'S'],
[ 12., 'S'],
[ 13., 'M'],
[ 14., 'L'],
[ 15., 'S'],
[ 16., 'S']]
data2 = np.zeros(data.shape, dtype=str)
for k in key:
data2[data == k[0]] = k[1]
# Create a dataframe out of your 'data' array and make a dictionary out of your 'key' array.
import numpy as np
import pandas as pd
data = np.array([[ 13., 15., 15.],
[ 14., 13., 14. ],
[ 16., 13., 13. ]])
data_df = pd.DataFrame(data)
key = dict({10 : 'S',11 : 'S', 12 : 'S', 13 : 'M',14:'L',15:'S',16:'S'})
# Replace the values in newly created dataframe and convert that into array.
data_df.replace(key,inplace = True)
data = np.array(data_df)
print(data)
This will be the output:
[['M' 'S' 'S']
['L' 'M' 'L']
['S' 'M' 'M']]