How to get the number of the most frequent value in a column?

前端 未结 6 1683
夕颜
夕颜 2020-12-24 02:02

I have a data frame and I would like to know how many times a given column has the most frequent value.

I try to do it in the following way:

items_co         


        
相关标签:
6条回答
  • 2020-12-24 02:21

    The NaN values are omitted for calculating frequencies. Please check your code functionality here But you can use the below code for same functionality.

    **>> Code:**
        # Importing required module
        from collections import Counter
    
        # Creating a dataframe
        df = pd.DataFrame({ 'A':["jan","jan","jan","mar","mar","feb","jan","dec",
                                 "mar","jan","dec"]  }) 
        # Creating a counter object
        count = Counter(df['A'])
        # Calling a method of Counter object(count)
        count.most_common(3)
    
    **>> Output:**
    
        [('jan', 5), ('mar', 3), ('dec', 2)]
    
    0 讨论(0)
  • 2020-12-24 02:27

    Add this line of code to find the most frequent value

    df["item"].value_counts().nlargest(n=1).values[0]
    
    0 讨论(0)
  • 2020-12-24 02:39

    To continue to @jonathanrocher answer you could use mode in pandas DataFrame. It'll give a most frequent values (one or two) across the rows or columns:

    import pandas as pd
    import numpy as np
    df = pd.DataFrame({"a": [1,2,2,4,2], "b": [np.nan, np.nan, np.nan, 3, 3]})
    
    In [2]: df.mode()
    Out[2]: 
       a    b
    0  2  3.0
    
    0 讨论(0)
  • 2020-12-24 02:40

    Just take the first row of your items_counts series:

    top = items_counts.head(1)  # or items_counts.iloc[[0]]
    value, count = top.index[0], top.iat[0]
    

    This works because pd.Series.value_counts has sort=True by default and so is already ordered by counts, highest count first. Extracting a value from an index by location has O(1) complexity, while pd.Series.idxmax has O(n) complexity where n is the number of categories.

    Specifying sort=False is still possible and then idxmax is recommended:

    items_counts = df['item'].value_counts(sort=False)
    top = items_counts.loc[[items_counts.idxmax()]]
    value, count = top.index[0], top.iat[0]
    

    Notice in this case you don't need to call max and idxmax separately, just extract the index via idxmax and feed to the loc label-based indexer.

    0 讨论(0)
  • 2020-12-24 02:45

    You may also consider using scipy's mode function which ignores NaN. A solution using it could look like:

    from scipy.stats import mode
    from numpy import nan
    df = DataFrame({"a": [1,2,2,4,2], "b": [nan, nan, nan, 3, 3]})
    print mode(df)
    

    The output would look like

    (array([[ 2.,  3.]]), array([[ 3.,  2.]]))
    

    meaning that the most common values are 2 for the first columns and 3 for the second, with frequencies 3 and 2 respectively.

    0 讨论(0)
  • 2020-12-24 02:46

    It looks like you may have some nulls in the column. You can drop them with df = df.dropna(subset=['item']). Then df['item'].value_counts().max() should give you the max counts, and df['item'].value_counts().idxmax() should give you the most frequent value.

    0 讨论(0)
提交回复
热议问题