As a challenge, I\'ve given myself this problem:
Given 2 lists, A, and B, where B is a shuffled version of A, the idea is to figure out the shuffled indices.
As an improvement over your current solution, you could use collections.defaultdict and avoid dict.setdefault:
from collections import defaultdict
A = [10, 40, 30, 2]
B = [30, 2, 10, 40]
idx = defaultdict(list)
for i, l in enumerate(A):
idx[l].append(i)
res = [idx[l].pop() for l in B]
print(res)
Here are the timings for the two methods using the sample input given:
Script used for testing
from timeit import timeit
setup = """
from collections import defaultdict;
idx1 = defaultdict(list); idx2 = {}
A = [10, 40, 30, 2]
B = [30, 2, 10, 40]
"""
me = """
for i, l in enumerate(A):
idx1[l].append(i)
res = [idx1[l].pop() for l in B]
"""
coldspeed = """
for i, l in enumerate(A):
idx2.setdefault(l, []).append(i)
res = [idx2[l].pop() for l in B]
"""
print(timeit(setup=setup, stmt=me))
print(timeit(setup=setup, stmt=coldspeed))
Results
original: 2.601998388010543
modified: 2.0607256239745766
So it appears that using defaultdict
actually yields a slight speed increase. This actually makes since though since defaultdict
is implemented in C rather than Python. Not to mention that the attribute lookup of the original solution - idx.setdefault1
- is costly.