问题
I have a list of tuples, each tuple of which contains one string and two integers. The list looks like this:
x = [('a',1,2), ('b',3,4), ('x',5,6), ('a',2,1)]
The list contains thousands of such tuples. Now if I want to get unique combinations, I can do the frozenset
on my list as follows:
y = set(map(frozenset, x))
This gives me the following result:
{frozenset({'a', 2, 1}), frozenset({'x', 5, 6}), frozenset({3, 'b', 4})}
I know that set is an unordered data structure and this is normal case but I want to preserve the order of the elements here so that I can thereafter insert the elements in a pandas
dataframe. The dataframe will look like this:
Name Marks1 Marks2
0 a 1 2
1 b 3 4
2 x 5 6
回答1:
Instead of operating on the set
of frozenset
s directly you could use that only as a helper data-structure - like in the unique_everseen recipe in the itertools section (copied verbatim):
from itertools import filterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
Basically this would solve the issue when you use key=frozenset
:
>>> x = [('a',1,2), ('b',3,4), ('x',5,6), ('a',2,1)]
>>> list(unique_everseen(x, key=frozenset))
[('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]
This returns the elements as-is and it also maintains the relative order between the elements.
回答2:
No ordering with frozensets. You can instead create sorted tuples to check for the existence of an item, adding the original if the tuple does not exist in the set:
y = set()
lst = []
for i in x:
t = tuple(sorted(i, key=str)
if t not in y:
y.add(t)
lst.append(i)
print(lst)
# [('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]
The first entry gets preserved.
回答3:
There are some quite useful functions in NumPy which can help you to solve this problem.
import numpy as np
chrs, indices = np.unique(list(map(lambda x:x[0], x)), return_index=True)
chrs, indices
>> (array(['a', 'b', 'x'],
dtype='<U1'), array([0, 1, 2]))
[x[indices[i]] for i in range(indices.size)]
>> [('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]
来源:https://stackoverflow.com/questions/45936362/maintaining-the-order-of-the-elements-in-a-frozen-set