My data is here.
q = pd.qcut(df['loss_percent'], 10)
ValueError: Bin edges must be unique: array([ 0.38461538, 0.38461538, 0.46153846, 0.46153846, 0.53846154,
0.53846154, 0.53846154, 0.61538462, 0.69230769, 0.76923077, 1. ])
I have read through why-use-pandas-qcut-return-valueerror, however I am still confused.
I imagine that one of my values has a high frequency of occurrence and that is breaking qcut.
First, step is how do I determine if that is indeed the case, and which value is the problem. Lastly, what kind of solution is appropriate given my data.
piRSquared
Using the solution in the post https://stackoverflow.com/a/36883735/2336654
def pct_rank_qcut(series, n):
edges = pd.Series([float(i) / n for i in range(n + 1)])
f = lambda x: (edges >= x).argmax()
return series.rank(pct=1).apply(f)
q = pct_rank_qcut(df.loss_percent, 10)
来源:https://stackoverflow.com/questions/41475398/pd-qcut-valueerror-bin-edges-must-be-unique