I have two lists, one of which is massive (millions of elements), the other several thousand. I want to do the following
bigArray=[0,1,0,2,3,2,,.....]
small
So far I don't see any need for numpy; you can make use of defaultdict
, provided that you memory is sufficient, it should be if number of observation is not too many millions.
big_list = [0,1,0,2,3,2,5,6,7,5,6,4,5,3,4,3,5,6,5]
small_list = [0,1,2,3,4]
from collections import defaultdict
dicto = defaultdict(list) #dictionary stores all the relevant coordinates
#so you don't have to search for them later
for ind, ele in enumerate(big_list):
dicto[ele].append(ind)
Result:
>>> for ele in small_list:
... print dicto[ele]
...
[0, 2]
[1]
[3, 5]
[4, 13, 15]
[11, 14]
This should give you some speed.