Detect significant changes in a data-set that gradually changes

谁说胖子不能爱 提交于 2020-01-03 05:13:23

问题


I have a list of data in python that represents amount of resources used per minute. I want to find the number of times it changes significantly in that data set. What I mean by significant change is a bit different from what I've read so far.

For e.g. if I have a dataset like [10,15,17,20,30,40,50,70,80,60,40,20]

I say a significant change happens when data increases by double or reduces by half with respect to the previous normal.

For e.g. since the list starts with 10, that is our starting normal point

Then when data doubles to 20, I count that as one significant change and set the normal to 20.

Then when data doubles to 40, it is considered a significant change and the normal is now 40

Then when data doubles to 80, it is considered a significant change and the normal is now 80

After that when data reduces by half to 40, it is considered as another significant change and the normal becomes 40

Finally when data reduces by half to 20, it is the last significant change

Here there are a total of 5 significant changes.

Is it similar to any other change detection algorithm? How can this be done efficiently in python?


回答1:


This is relatively straightforward. You can do this with a single iteration through the list. We simply update our base when a 'significant' change occurs.

Note that my implementation will work for any iterable or container. This is useful if you want to, for example, read through a file without having to load it all into memory.

def gen_significant_changes(iterable, *, tol = 2):
    iterable = iter(iterable) # this is necessary if it is container rather than generator.
    # note that if the iterable is already a generator iter(iterable) returns itself.
    base = next(iterable)
    for x in iterable:
        if x >= (base * tol) or x <= (base/tol):
            yield x
            base = x

my_list = [10,15,17,20,30,40,50,70,80,60,40,20]

print(list(gen_significant_changes(my_list)))



回答2:


I can't help with the Python part, but in terms of math, the problem you're asking is fairly simple to solve using log base 2. A significant change occurs when the current value divided by a constant can be reached by raising 2 to a different power (as an integer) than the previous value. (The constant is needed since the first value in the array forms the basis of comparison.)

For each element at t, compute:

current  = math.log(Array[t]  /Array[0], 2)
previous = math.log(Array[t-1]/Array[0], 2)
if math.floor(current) <> math.floor(previous) a significant change has occurred

Using this method you do not need to keep track of a "normal point" at all, you just need the array. By removing the additional state variable we enable the array to be processed in any order, and we could give portions of the array to different threads if the dataset were very large. You wouldn't be able to do that with your current method.



来源:https://stackoverflow.com/questions/43722597/detect-significant-changes-in-a-data-set-that-gradually-changes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!