I have a list of tuples, each tuple of which contains one string and two integers. The list looks like this:
x = [('a',1,2), ('b',3,4), ('x',5,6), ('a',2,1)]
The list contains thousands of such tuples. Now if I want to get unique combinations, I can do the frozenset
on my list as follows:
y = set(map(frozenset, x))
This gives me the following result:
{frozenset({'a', 2, 1}), frozenset({'x', 5, 6}), frozenset({3, 'b', 4})}
I know that set is an unordered data structure and this is normal case but I want to preserve the order of the elements here so that I can thereafter insert the elements in a pandas
dataframe. The dataframe will look like this:
Name Marks1 Marks2
0 a 1 2
1 b 3 4
2 x 5 6
Instead of operating on the set
of frozenset
s directly you could use that only as a helper data-structure - like in the unique_everseen
recipe in the itertools section (copied verbatim):
from itertools import filterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
Basically this would solve the issue when you use key=frozenset
:
>>> x = [('a',1,2), ('b',3,4), ('x',5,6), ('a',2,1)]
>>> list(unique_everseen(x, key=frozenset))
[('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]
This returns the elements as-is and it also maintains the relative order between the elements.
No ordering with frozensets. You can instead create sorted tuples to check for the existence of an item, adding the original if the tuple does not exist in the set:
y = set()
lst = []
for i in x:
t = tuple(sorted(i, key=str)
if t not in y:
y.add(t)
lst.append(i)
print(lst)
# [('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]
The first entry gets preserved.
There are some quite useful functions in NumPy which can help you to solve this problem.
import numpy as np
chrs, indices = np.unique(list(map(lambda x:x[0], x)), return_index=True)
chrs, indices
>> (array(['a', 'b', 'x'],
dtype='<U1'), array([0, 1, 2]))
[x[indices[i]] for i in range(indices.size)]
>> [('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]
来源:https://stackoverflow.com/questions/45936362/maintaining-the-order-of-the-elements-in-a-frozen-set