How to detect outlier peaks in a water flow time series?

♀尐吖头ヾ 提交于 2020-07-09 06:32:56

问题


TL;DR: Have water flow time series needed to be treated, can't figure it out a way to remove outlier peaks.

I'm currently working in a project where I receive a .csv dataset containing two columns:

  • date, a datetime timestamp
  • value, a water flow value

This dataset is usually one year of measures of a water flow sensor of a management entity with automatic irrigation systems, containing around 402 000 raw values. Sometimes it can have some peaks that doesn't correspond to a watering period, because it's a punctual value between normal values, like in the image.

So far I've tried going with calculating the percentage differences between two points and the spacing and calculating the median absolute deviation (MAD) but both catch false positives.

The issue here is I need an algorithm that identifies a spontaneous peak that lasts 1 or 2 measures, because it's physically impossible to have a 300% increase in flow for 2 minutes.

The other issue is in coding. It is needed to have a dynamic way to detect these peaks because, according to the whole dataset we clearly see why: In the summer the flow increases to more than double, making impossible to go with a .95 percentile.

I've prepared a github repo with the techniques stated above and 1 day of the dataset, the one I'm currently using now (It's around 1000 values).


回答1:


Not a real answer but too long for a comment:

Maybe you could use the prominence of the peaks. You can use find_peaks with the prominence and width parameters and try and tweak other parameters like window size for prominence calculation (wlen).

The following quick example only illustrates the usage. It just finds peaks with a minimum prominence of arbitrarily 3 times the median:

from scipy.signal import find_peaks
df = pd.read_csv('https://raw.githubusercontent.com/MigasTigas/peak_removal/master/dataset_simple_example.csv')
peaks,_ = find_peaks(df.value, prominence=df.value.median()*3, width=(1,2))
ax = df.plot()
df.iloc[peaks.tolist()].plot(style=['x'], ax=ax)



来源:https://stackoverflow.com/questions/62238285/how-to-detect-outlier-peaks-in-a-water-flow-time-series

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!