numpy.unique gives wrong output for list of sets

China☆狼群 提交于 2021-01-14 05:05:04

问题


I have a list of sets given by,

sets1 = [{1},{2},{1}]

When I find the unique elements in this list using numpy's unique, I get

np.unique(sets1)
Out[18]: array([{1}, {2}, {1}], dtype=object)

As can be seen seen, the result is wrong as {1} is repeated in the output.

When I change the order in the input by making similar elements adjacent, this doesn't happen.

sets2 = [{1},{1},{2}]

np.unique(sets2)
Out[21]: array([{1}, {2}], dtype=object)

Why does this occur? Or is there something wrong in the way I have done?


回答1:


What happens here is that the np.unique function is based on the np._unique1d function from NumPy (see the code here), which itself uses the .sort() method.

Now, sorting a list of sets that contain only one integer in each set will not result in a list with each set ordered by the value of the integer present in the set. So we will have (and that is not what we want):

sets = [{1},{2},{1}]
sets.sort()
print(sets)

# > [{1},{2},{1}]
# ie. the list has not been "sorted" like we want it to

Now, as you have pointed out, if the list of sets is already ordered in the way you want, np.unique will work (since you would have sorted the list beforehand).

One specific solution (though, please be aware that it will only work for a list of sets that each contain a single integer) would then be:

np.unique(sorted(sets, key=lambda x: next(iter(x))))



回答2:


That is because set is unhashable type

{1} is {1} # will give False

you can use python collections.Counter if you can can convert the set to tuple like below

from collections import Counter
sets1 = [{1},{2},{1}]
Counter([tuple(a) for a in sets1])


来源:https://stackoverflow.com/questions/58977212/numpy-unique-gives-wrong-output-for-list-of-sets

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!