Convert data to the quantile bin

后端 未结 1 1963
眼角桃花
眼角桃花 2021-02-01 07:58

I have a dataframe with numerical columns. For each column I would like calculate quantile information and assign each row to one of them. I tried to use the qcut() method to re

相关标签:
1条回答
  • 2021-02-01 08:44

    I think using the labels stored inside the Categorical object returned by qcut can make this a lot simpler. For example:

    >>> import pandas as pd
    >>> import numpy as np
    >>> np.random.seed(1001)
    >>> df = pd.DataFrame(np.random.randn(10, 2), columns=['A', 'B'])
    >>> df
              A         B
    0 -1.086446 -0.896065
    1 -0.306299 -1.339934
    2 -1.206586 -0.641727
    3  1.307946  1.845460
    4  0.829115 -0.023299
    5 -0.208564 -0.916620
    6 -1.074743 -0.086143
    7  1.175839 -1.635092
    8  1.228194  1.076386
    9  0.394773 -0.387701
    >>> q = pd.qcut(df["A"], 5)
    >>> q
    Categorical: A
    array([[-1.207, -1.0771], (-1.0771, -0.248], [-1.207, -1.0771],
           (1.186, 1.308], (0.569, 1.186], (-0.248, 0.569], (-1.0771, -0.248],
           (0.569, 1.186], (1.186, 1.308], (-0.248, 0.569]], dtype=object)
    Levels (5): Index([[-1.207, -1.0771], (-1.0771, -0.248],
                       (-0.248, 0.569], (0.569, 1.186], (1.186, 1.308]], dtype=object)
    >>> q.labels
    array([0, 1, 0, 4, 3, 2, 1, 3, 4, 2])
    

    or to match your code:

    >>> len(q.levels) - q.labels
    array([5, 4, 5, 1, 2, 3, 4, 2, 1, 3])
    >>> quintile(df, "A")
    >>> np.array(df["A"])
    array([5, 4, 5, 1, 2, 3, 4, 2, 1, 3])
    
    0 讨论(0)
提交回复
热议问题