问题
My data is here.
q = pd.qcut(df['loss_percent'], 10)
ValueError: Bin edges must be unique: array([ 0.38461538, 0.38461538, 0.46153846, 0.46153846, 0.53846154,
0.53846154, 0.53846154, 0.61538462, 0.69230769, 0.76923077, 1. ])
I have read through why-use-pandas-qcut-return-valueerror, however I am still confused.
I imagine that one of my values has a high frequency of occurrence and that is breaking qcut.
First, step is how do I determine if that is indeed the case, and which value is the problem. Lastly, what kind of solution is appropriate given my data.
回答1:
Using the solution in the post https://stackoverflow.com/a/36883735/2336654
def pct_rank_qcut(series, n):
edges = pd.Series([float(i) / n for i in range(n + 1)])
f = lambda x: (edges >= x).argmax()
return series.rank(pct=1).apply(f)
q = pct_rank_qcut(df.loss_percent, 10)
来源:https://stackoverflow.com/questions/41475398/pd-qcut-valueerror-bin-edges-must-be-unique