find and select the most frequent data of column in pandas DataFrame

后端 未结 3 726
别跟我提以往
别跟我提以往 2021-01-12 09:33

I have a dataframe with the following column:

file[\'DirViento\']

Fecha
2011-01-01    ENE
2011-01-02    ENE
2011-01-03    ENE
2011-01-04    NNE 
2011-01-05          


        
相关标签:
3条回答
  • 2021-01-12 09:54

    This is not as straightforward as it could be (should be).

    As you probably know, the statistics jargon for the most common value is the "mode." Numpy does not have a built-in function for this, but scipy does. Import it like so:

    from scipy.stats.mstats import mode
    

    It does more than simply return the most common value, as you can read about in the docs, so it's convenient to define a function that uses mode to just get the most common value.

    f = lambda x: mode(x, axis=None)[0]
    

    And now, instead of value_counts(), use apply(f). Here is an example:

    In [20]: DataFrame([1,1,2,2,2,3], index=[1,1,1,2,2,2]).groupby(level=0).apply(f)
    Out[20]: 
    1    1.0
    2    2.0
    dtype: object
    

    Update: Scipy's mode does not work with strings. For your string data, you'll need to define a more general mode function. This answer should do the trick.

    0 讨论(0)
  • 2021-01-12 09:59

    Pandas 0.15.2 has a DataFrame.mode() method. It might be of use to someone looking for this as I was.

    Here are the docs.

    Edit: For the Value:

    DataFrame.mode()[0]
    
    0 讨论(0)
  • 2021-01-12 10:02
    1. For whole dataframe, you can use:

      dataframe.mode()
      
    2. For specific column:

      dataframe.mode()['Column'][0]
      

    Second case is more useful in imputing the values.

    0 讨论(0)
提交回复
热议问题