pandas: qcut error: ValueError: Bin edges must be unique:

[亡魂溺海] 提交于 2019-12-12 19:06:45

问题


I am trying to compute percentile of two columns using the pandas qcut method like below:

my_df['float_col_quantile'] = pd.qcut(my_df['float_col'], 100, labels=False)
my_df['int_col_quantile'] = pd.qcut(my_df['int_col'].astype(float), 100, labels=False)

The column float_col_quantile works fine, but the column int_col_quantile has the following error. Any idea what I did wrong here? And how can I fix this problem? Thanks!


ValueError                                Traceback (most recent call last)
<ipython-input-19-b955e0b00953> in <module>()
      1 my_df['float_col_quantile'] = pd.qcut(my_df['float_col'], 100, labels=False)
----> 2 my_df['int_col_quantile'] = pd.qcut(my_df['int_col'].astype(float), 100, labels=False)


/usr/local/lib/python3.4/dist-packages/pandas/tools/tile.py in qcut(x, q, labels, retbins, precision)
    173     bins = algos.quantile(x, quantiles)
    174     return _bins_to_cuts(x, bins, labels=labels, retbins=retbins,
--> 175                          precision=precision, include_lowest=True)
    176 
    177 

/usr/local/lib/python3.4/dist-packages/pandas/tools/tile.py in _bins_to_cuts(x, bins, right, labels, retbins, precision, name, include_lowest)
    192 
    193     if len(algos.unique(bins)) < len(bins):
--> 194         raise ValueError('Bin edges must be unique: %s' % repr(bins))
    195 
    196     if include_lowest:

ValueError: Bin edges must be unique: array([  1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,
         1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,
         1.,   1.,   1.,   1.,   2.,   2.,   2.,   2.,   2.,   2.,   2.,
         2.,   2.,   2.,   2.,   2.,   2.,   2.,   2.,   2.,   2.,   2.,
         2.,   2.,   2.,   2.,   2.,   2.,   2.,   2.,   2.,   2.,   2.,
         2.,   2.,   2.,   2.,   2.,   2.,   2.,   2.,   2.,   2.,   2.,
         2.,   2.,   2.,   4.,   4.,   4.,   4.,   4.,   4.,   4.,   4.,
         4.,   4.,   4.,   4.,   4.,   4.,   4.,   4.,   4.,   4.,   4.,
         4.,   4.,   4.,   4.,   4.,   4.,   4.,   4.,   8.,   8.,   8.,
         8.,  10.])

回答1:


The problem is pandas.qcut chooses the bins so that you have the same number of records in each bin/quantile, but the same value cannot fall in multiple bins/quantiles.

Here is a list of solutions.



来源:https://stackoverflow.com/questions/44639189/pandas-qcut-error-valueerror-bin-edges-must-be-unique

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!