问题
I am having a continuously incoming data represented by an array of integer x = [x1,...,xn], n<1 000 000
. Each two elements satisfy the following condition x[i] < x[i + 1]
.
I need to detected as fast as possible such a breakpoint, where the linear trend of these data ends and transforms into a quadratic trend. The data always starts with linear trend...
I tried to compute
k = (x[i+1] - x[i])/ (x[i] - x[i-1])
But this test not too reliable... Maybe there is a more simple and efficent statistic test... The computation of the regression line is slow in this case...
回答1:
Actually you calculate a derivative of the function. Possibly you should use more points for calculating it e.g. 5, see Five-point stencil
回答2:
Keep track of first derivation and second derivation. That is, keep the mean and variance of x[i]-x[i-1]. And keep sum and variance of (x[i+1]-x[i]) - (x[i]-x[i-1]).
For linear trend the mean of first derivative should be constant and if you observe a deviation from mean (which you can calculate using variance), then you can say something is wrong. The mean of second derivative should be 0.
For quadratic trend, mean of first derivative increases. So you will find many samples with large deviation from mean. The second derivative's behavior is similar to behavior of first derivative in linear case.
An Algorithm (using just the second derivative):
- For each input, calculate the sign (+ve or -ve) second derivative
- Keep track of how many homogenous signs you got recently (i.e. if sequence is -+-++++ the answer is 4)
- If the length of homogenous signs is greater than a threshold (let us say 40 ?), then mark it as beginning of quadratic sequence
回答3:
You can use a running window regression here.
The computation of the linear regression coefficients on W points involves sums of terms of the form X[i], i.X[i] and X[i]^2. If you store these sums, you easily shift by one point by deducing the terms for the leftmost point and adding the terms for the rightmost point (the i.X[i] becoming (i+1).X[i], i.e. i.X[i]+X[i]). Your data values are integer, there will be no roundoff accumulation.
This said, you can compute the running regression in constant time for every W consecutive points and detect a drop of the correlation coefficient.
回答4:
For an ultra-fast solution, you may consider a test like:
| X[i + s] - 2 X[i] + X[i - s] | > k (X[i + s] - X[i - s])
for well chosen s and k.
Have a look at a plot of | X[i + s] - 2 X[i] + X[i - s] | / (X[i + s] - X[i - s]) as a function of i, for increasing values of s.
来源:https://stackoverflow.com/questions/9300430/is-there-a-linear-trend-in-data