ValueError: Data must be positive (boxcox scipy)

我的未来我决定 提交于 2020-05-31 06:40:47

问题


I'm trying to transform my dataset to a normal distribution.

0      8.298511e-03
1      3.055319e-01
2      6.938647e-02
3      2.904091e-02
4      7.422441e-02
5      6.074046e-02
6      9.265747e-04
7      7.521846e-02
8      5.960521e-02
9      7.405019e-04
10     3.086551e-02
11     5.444835e-02
12     2.259236e-02
13     4.691038e-02
14     6.463911e-02
15     2.172805e-02
16     8.210005e-02
17     2.301189e-02
18     4.073898e-07
19     4.639910e-02
20     1.662777e-02
21     8.662539e-02
22     4.436425e-02
23     4.557591e-02
24     3.499897e-02
25     2.788340e-02
26     1.707958e-02
27     1.506404e-02
28     3.207647e-02
29     2.147011e-03
30     2.972746e-02
31     1.028140e-01
32     2.183737e-02
33     9.063370e-03
34     3.070437e-02
35     1.477440e-02
36     1.036309e-02
37     2.000609e-01
38     3.366233e-02
39     1.479767e-03
40     1.137169e-02
41     1.957088e-02
42     4.921303e-03
43     4.279257e-02
44     4.363429e-02
45     1.040123e-01
46     2.930958e-02
47     1.935434e-03
48     1.954418e-02
49     2.980253e-02
50     3.643772e-02
51     3.411437e-02
52     4.976063e-02
53     3.704608e-02
54     7.044161e-02
55     8.101365e-03
56     9.310477e-03
57     7.626637e-02
58     8.149728e-03
59     4.157399e-01
60     8.200258e-02
61     2.844295e-02
62     1.046601e-01
63     6.565680e-02
64     9.825436e-04
65     9.353639e-02
66     6.535298e-02
67     6.979044e-04
68     2.772859e-02
69     4.378422e-02
70     2.020185e-02
71     4.774493e-02
72     6.346146e-02
73     2.466264e-02
74     6.636585e-02
75     2.548934e-02
76     1.113937e-06
77     5.723409e-02
78     1.533288e-02
79     1.027341e-01
80     4.294570e-02
81     4.844853e-02
82     5.579620e-02
83     2.531824e-02
84     1.661426e-02
85     1.430836e-02
86     3.157232e-02
87     2.241722e-03
88     2.946256e-02
89     1.038383e-01
90     1.868837e-02
91     8.854596e-03
92     2.391759e-02
93     1.612714e-02
94     1.007823e-02
95     1.975513e-01
96     3.581289e-02
97     1.199747e-03
98     1.263381e-02
99     1.966746e-02
100    4.040786e-03
101    4.497264e-02
102    4.030524e-02
103    8.627087e-02
104    3.248317e-02
105    5.727582e-03
106    1.781355e-02
107    2.377991e-02
108    4.299568e-02
109    3.664353e-02
110    5.167902e-02
111    4.006848e-02
112    7.072990e-02
113    6.744938e-03
114    1.064900e-02
115    9.823497e-02
116    8.992714e-03
117    1.792453e-01
118    6.817763e-02
119    2.588843e-02
120    1.048027e-01
121    6.468491e-02
122    1.035536e-03
123    8.800684e-02
124    5.975065e-02
125    7.365861e-04
126    4.209485e-02
127    4.232421e-02
128    2.371866e-02
129    5.894714e-02
130    7.177195e-02
131    2.116566e-02
132    7.579219e-02
133    3.174744e-02
134    0.000000e+00
135    5.786439e-02
136    1.458493e-02
137    9.820156e-02
138    4.373873e-02
139    4.271649e-02
140    5.532575e-02
141    2.311324e-02
142    1.644508e-02
143    1.328273e-02
144    3.908473e-02
145    2.355468e-03
146    2.519321e-02
147    1.131868e-01
148    1.708967e-02
149    1.027661e-02
150    2.439899e-02
151    1.604058e-02
152    1.134323e-02
153    2.247722e-01
154    3.408590e-02
155    2.222239e-03
156    1.659830e-02
157    2.284733e-02
158    4.618550e-03
159    3.674162e-02
160    4.131283e-02
161    8.846273e-02
162    2.504404e-02
163    6.004396e-03
164    1.986309e-02
165    2.347111e-02
166    3.865636e-02
167    3.672307e-02
168    6.658419e-02
169    3.726879e-02
170    7.600138e-02
171    7.184871e-03
172    1.142840e-02
173    9.741311e-02
174    8.165448e-03
175    1.529210e-01
176    6.648081e-02
177    2.617601e-02
178    9.547816e-02
179    6.857775e-02
180    8.129399e-04
181    7.107914e-02
182    5.884794e-02
183    8.398721e-04
184    6.972981e-02
185    4.461767e-02
186    2.264404e-02
187    5.566633e-02
188    6.595136e-02
189    2.301914e-02
190    7.488919e-02
191    3.108619e-02
192    4.989364e-07
193    4.834949e-02
194    1.422578e-02
195    9.398186e-02
196    4.870391e-02
197    3.841369e-02
198    6.406801e-02
199    2.603315e-02
200    1.692629e-02
201    1.409982e-02
202    4.099215e-02
203    2.093724e-03
204    2.640732e-02
205    1.032129e-01
206    1.581881e-02
207    8.977325e-03
208    1.941141e-02
209    1.502126e-02
210    9.923589e-03
211    2.757357e-01
212    3.096234e-02
213    4.388900e-03
214    1.784778e-02
215    2.179550e-02
216    3.944159e-03
217    3.703552e-02
218    4.033897e-02
219    1.157076e-01
220    2.400446e-02
221    5.761179e-03
222    1.899621e-02
223    2.401468e-02
224    4.458745e-02
225    3.357898e-02
226    5.331003e-02
227    3.488753e-02
228    7.466599e-02
229    6.075236e-03
230    9.815318e-03
231    9.598735e-02
232    7.103607e-03
233    1.100602e-01
234    5.677641e-02
235    2.420500e-02
236    9.213369e-02
237    4.024043e-02
238    6.987694e-04
239    8.612055e-02
240    5.663353e-02
241    4.871693e-04
242    4.533811e-02
243    3.593244e-02
244    1.982537e-02
245    5.490786e-02
246    5.603109e-02
247    1.671653e-02
248    6.522711e-02
249    3.341356e-02
250    2.378629e-06
251    4.299939e-02
252    1.223163e-02
253    8.392798e-02
254    4.272826e-02
255    3.183946e-02
256    4.431299e-02
257    2.661024e-02
258    1.686707e-02
259    4.070924e-03
260    3.325947e-02
261    2.023611e-03
262    2.402284e-02
263    8.369778e-02
264    1.375093e-02
265    8.899898e-03
266    2.148740e-02
267    1.301483e-02
268    8.355791e-03
269    2.549934e-01
270    2.792516e-02
271    4.652563e-03
272    1.556313e-02
273    1.936942e-02
274    3.547794e-03
275    3.412516e-02
276    3.932606e-02
277    5.305868e-02
278    2.354438e-02
279    5.379380e-03
280    1.904203e-02
281    2.045495e-02
282    3.275855e-02
283    3.007389e-02
284    8.227664e-02
285    2.479949e-02
286    6.573835e-02
287    5.165842e-03
288    7.599650e-03
289    9.613557e-02
290    6.690175e-03
291    1.779880e-01
292    5.076263e-02
293    3.117607e-02
294    7.495692e-02
295    3.707768e-02
296    7.086975e-04
297    8.935981e-02
298    5.624249e-02
299    7.105331e-04
300    3.339868e-02
301    3.354603e-02
302    2.041988e-02
303    3.862522e-02
304    5.977081e-02
305    1.730081e-02
306    6.909621e-02
307    3.729478e-02
308    3.940647e-07
309    4.385336e-02
310    1.391891e-02
311    8.898305e-02
312    3.840141e-02
313    3.214408e-02
314    4.284080e-02
315    1.841022e-02
316    1.528207e-02
317    3.106559e-03
318    3.945481e-02
319    2.085094e-03
320    2.464190e-02
321    7.844914e-02
322    1.526590e-02
323    9.922147e-03
324    1.649218e-02
325    1.341602e-02
326    8.124446e-03
327    2.867380e-01
328    2.663867e-02
329    5.342012e-03
330    1.752612e-02
331    2.010863e-02
332    3.581845e-03
333    3.652284e-02
334    4.484362e-02
335    4.600939e-02
336    2.213280e-02
337    5.494917e-03
338    2.016594e-02
339    2.118010e-02
340    2.964000e-02
341    3.405549e-02
342    1.014185e-01
343    2.451624e-02
344    7.966998e-02
345    5.301538e-03
346    8.198895e-03
347    8.789368e-02
348    7.222417e-03
349    1.448276e-01
350    5.676056e-02
351    2.987054e-02
352    6.851434e-02
353    4.193034e-02
354    7.025054e-03
355    8.557358e-02
356    5.812736e-02
357    2.263676e-02
358    2.922588e-02
359    3.363161e-02
360    1.495056e-02
361    5.871619e-02
362    6.235094e-02
363    1.691340e-02
364    5.361939e-02
365    3.722318e-02
366    9.828477e-03
367    4.155345e-02
368    1.327760e-02
369    7.205372e-02
370    4.151130e-02
371    3.265365e-02
372    2.879418e-02
373    2.314340e-02
374    1.653692e-02
375    1.077611e-02
376    3.481427e-02
377    1.815487e-03
378    2.232305e-02
379    1.005192e-01
380    1.491262e-02
381    3.752658e-02
382    1.271613e-02
383    1.223707e-02
384    8.088923e-03
385    2.572550e-01
386    2.300194e-02
387    2.847960e-02
388    1.782098e-02
389    1.900759e-02
390    3.647629e-03
391    3.723368e-02
392    4.079514e-02
393    5.510332e-02
394    3.072313e-02
395    4.183566e-03
396    1.891549e-02
397    1.870293e-02
398    3.182769e-02
399    4.167840e-02
400    1.343152e-01
401    2.451973e-02
402    7.567017e-02
403    4.837843e-03
404    6.477297e-03
405    7.664675e-02
Name: value, dtype: float64

This is the code I used for transforming dataset:

from scipy import stats
x,_ = stats.boxcox(df)

I get this error:

            if any(x <= 0):
-> 1031         raise ValueError("Data must be positive.")
   1032 
   1033     if lmbda is not None:  # single transformation

ValueError: Data must be positive

Is it because my values are too small that it's producing an error? Not sure what I'm doing wrong. New to using boxcox, could be using it incorrectly in this example. Open to suggestions and alternatives. Thanks!


回答1:


Your data contains the value 0 (at index 134). When boxcox says the data must be positive, it means strictly positive.

What is the meaning of your data? Does 0 make sense? Is that 0 actually a very small number that was rounded down to 0?

You could simply discard that 0. Alternatively, you could do something like the following. (This amounts to temporarily discarding the 0, and then using -1/λ for the transformed value of 0, where λ is the Box-Cox transformation parameter.)

First, create some data that contains one 0 (all other values are positive):

In [13]: np.random.seed(8675309)

In [14]: data = np.random.gamma(1, 1, size=405)

In [15]: data[100] = 0

(In your code, you would replace that with, say, data = df.values.)

Copy the strictly positive data to posdata:

In [16]: posdata = data[data > 0]

Find the optimal Box-Cox transformation, and verify that λ is positive. This work-around doesn't work if λ ≤ 0.

In [17]: bcdata, lam = boxcox(posdata)

In [18]: lam
Out[18]: 0.244049919975582

Make a new array to hold that result, along with the limiting value of the transform of 0 (which is -1/λ):

In [19]: x = np.empty_like(data)

In [20]: x[data > 0] = bcdata

In [21]: x[data == 0] = -1/lam

The following plot shows the histograms of data and x.




回答2:


Is your data that you are sending to boxcox 1-dimensional ndarray?

Second way could be adding shift parameter by summing shift (see details from the link) to all of the ndarray elements before sending it to boxcox and subtracting shift from the resulting array elements (if I have understood boxcox algorithm correctly, that could be solution in your case, too).

https://docs.scipy.org/doc/scipy-0.16.1/reference/generated/scipy.stats.boxcox.html



来源:https://stackoverflow.com/questions/50180988/valueerror-data-must-be-positive-boxcox-scipy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!