Simplest way to find the element that occurs the most in each column

后端未结

关注

 4  1207

Suppose I have

data =
[[a, a, c],
 [b, c, c],
 [c, b, b],
 [b, a, c]]

I want to get a list containing the element that occurs the most in each

相关标签:

4条回答

孤城傲影

2021-01-24 18:33
Use a list comprehension plus collections.Counter():
```
from collections import Counter

[Counter(col).most_common(1)[0][0] for col in zip(*data)]
```
zip(*data) rearranges your list of lists to become a list of columns instead. Counter() objects count how often anything appears in the input sequence, and .most_common(1) gives us the most popular element (plus it's count).

Provided your input is single character strings, that gives:
```
>>> [Counter(col).most_common(1)[0][0] for col in zip(*data)]
['b', 'a', 'c']
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

一生所求

2021-01-24 18:35

In statistics, what you want is called the mode. The scipy library (http://www.scipy.org/) has a mode function, in scipy.stats.

In [32]: import numpy as np

In [33]: from scipy.stats import mode

In [34]: data = np.random.randint(1,6, size=(6,8))

In [35]: data
Out[35]: 
array([[2, 1, 5, 5, 3, 3, 1, 4],
       [5, 3, 2, 2, 5, 2, 5, 3],
       [2, 2, 5, 3, 3, 2, 1, 1],
       [2, 4, 1, 5, 4, 4, 4, 5],
       [4, 4, 5, 5, 2, 4, 4, 4],
       [2, 4, 1, 1, 3, 3, 1, 3]])

In [36]: val, count = mode(data, axis=0)

In [37]: val
Out[37]: array([[ 2.,  4.,  5.,  5.,  3.,  2.,  1.,  3.]])

In [38]: count
Out[38]: array([[ 4.,  3.,  3.,  3.,  3.,  2.,  3.,  2.]])

0 讨论(0)

刺人心

2021-01-24 18:40
Is the data hashable? If so, a collections.Counter will be helpful:
```
[Counter(col).most_common(1)[0][0] for col in zip(*data)]
```
It works because zip(*data) transposes the input data yielding 1 column at a time. The counter then counts the elements and stores the counts in a dictionary with the counts as values. Counters also have a most_common method which returns a list of the "N" items with the highest counts (sorted from most counts to least counts). So, you want to get the first element in the first item in the list returned by most_common which is where the [0][0] comes from.

e.g.
```
>>> a,b,c = 'abc'
>>> from collections import Counter
>>> data = [[a, a, c],
...  [b, c, c],
...  [c, b, b],
...  [b, a, c]]
>>> [Counter(col).most_common(1)[0][0] for col in zip(*data)]
['b', 'a', 'c']
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

没有蜡笔的小新

2021-01-24 18:50

Here's a solution without using the collections module

def get_most_common(data):

    data = zip(*data)
    count_dict = {}
    common = []
    for col in data:
        for val in col:
            count_dict[val] = count_dict.get(val, 0) + 1
        max_count = max([count_dict[key] for key in count_dict])
        common.append(filter(lambda k: count_dict[k] == max_count, count_dict))

    return common

if __name__ == "__main__":

    data = [['a','a','b'],
            ['b','c','c'],
            ['a','b','b'],
            ['b','a','c']]

    print get_most_common(data)

0 讨论(0)