问题
I have a numpy array of various one hot encoded numpy arrays, eg;
x = np.array([[1, 0, 0], [0, 0, 1], [1, 0, 0]])
I would like to count the occurances of each unique one hot vector,
{[1, 0, 0]: 2, [0, 0, 1]: 1}
回答1:
Approach #1
Seems like a perfect setup to use the new functionality of numpy.unique (v1.13 and newer) that lets us work along an axis of a NumPy array -
unq_rows, count = np.unique(x,axis=0, return_counts=1)
out = {tuple(i):j for i,j in zip(unq_rows,count)}
Sample outputs -
In [289]: unq_rows
Out[289]:
array([[0, 0, 1],
[1, 0, 0]])
In [290]: count
Out[290]: array([1, 2])
In [291]: {tuple(i):j for i,j in zip(unq_rows,count)}
Out[291]: {(0, 0, 1): 1, (1, 0, 0): 2}
Approach #2
For NumPy versions older than v1.13
, we can make use of the fact that the input array is one-hot encoded array, like so -
_, idx, count = np.unique(x.argmax(1), return_counts=1, return_index=1)
out = {tuple(i):j for i,j in zip(x[idx],count)} # x[idx] is unq_rows
回答2:
You could convert your arrays to tuples and use a Counter:
import numpy as np
from collections import Counter
x = np.array([[1, 0, 0], [0, 0, 1], [1, 0, 0]])
Counter([tuple(a) for a in x])
# Counter({(1, 0, 0): 2, (0, 0, 1): 1})
回答3:
The fastest way given your data format is:
x.sum(axis=0)
which gives:
array([2, 0, 1])
Where the 1st result is the count of arrays where the 1st is hot:
[1, 0, 0] [2
[0, 1, 0] 0
[0, 0, 1] 1]
This exploits the fact that only one can be on at a time, so we can decompose the direct sum.
If you absolutely need it expanded to the same format, it can be converted via:
sums = x.sum(axis=0)
{tuple(int(k == i) for k in range(len(sums))): e for i, e in enumerate(sums)}
or, similarly to tarashypka:
{tuple(row): count for row, count in zip(np.eye(len(sums), dtype=np.int64), sums)}
yields:
{(1, 0, 0): 2, (0, 1, 0): 0, (0, 0, 1): 1}
回答4:
Here is another interesting solution with sum
>> {tuple(v): n for v, n in zip(np.eye(x.shape[1], dtype=int), np.sum(x, axis=0))
if n > 0}
{(0, 0, 1): 1, (1, 0, 0): 2}
回答5:
Lists (including numpy arrays) are unhashable, i.e. they can't be keys of a dictionary. So your precise desired output, a dictionary with keys that look like [1, 0, 0]
is never possible in Python. To deal with this you need to map your vectors to tuples.
from collections import Counter
import numpy as np
x = np.array([[1, 0, 0], [0, 0, 1], [1, 0, 0]])
counts = Counter(map(tuple, x))
That will get you:
In [12]: counts
Out[12]: Counter({(0, 0, 1): 1, (1, 0, 0): 2})
来源:https://stackoverflow.com/questions/45176383/count-occurrences-of-unique-arrays-in-array