I have a dataframe with numerical columns. For each column I would like calculate quantile information and assign each row to one of them. I tried to use the qcut() method to re
I think using the labels
stored inside the Categorical
object returned by qcut
can make this a lot simpler. For example:
>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(1001)
>>> df = pd.DataFrame(np.random.randn(10, 2), columns=['A', 'B'])
>>> df
A B
0 -1.086446 -0.896065
1 -0.306299 -1.339934
2 -1.206586 -0.641727
3 1.307946 1.845460
4 0.829115 -0.023299
5 -0.208564 -0.916620
6 -1.074743 -0.086143
7 1.175839 -1.635092
8 1.228194 1.076386
9 0.394773 -0.387701
>>> q = pd.qcut(df["A"], 5)
>>> q
Categorical: A
array([[-1.207, -1.0771], (-1.0771, -0.248], [-1.207, -1.0771],
(1.186, 1.308], (0.569, 1.186], (-0.248, 0.569], (-1.0771, -0.248],
(0.569, 1.186], (1.186, 1.308], (-0.248, 0.569]], dtype=object)
Levels (5): Index([[-1.207, -1.0771], (-1.0771, -0.248],
(-0.248, 0.569], (0.569, 1.186], (1.186, 1.308]], dtype=object)
>>> q.labels
array([0, 1, 0, 4, 3, 2, 1, 3, 4, 2])
or to match your code:
>>> len(q.levels) - q.labels
array([5, 4, 5, 1, 2, 3, 4, 2, 1, 3])
>>> quintile(df, "A")
>>> np.array(df["A"])
array([5, 4, 5, 1, 2, 3, 4, 2, 1, 3])