I\'m looking for a computationally efficient way to find local maxima/minima for a large list of numbers in R.
Hopefully without for
loops...
For exampl
This function by Timothée Poisot is handy for noisy series:
May 3, 2009
An Algorithm To Find Local Extrema In A Vector
Filed under: Algorithm — Tags: Extrema, Time series — Timothée Poisot @ 6:46pmI spend some time looking for an algorithm to find local extrema in a vector (time series). The solution I used is to “walk” through the vector by step larger than 1, in order to retain only one value even when the values are very noisy (see the picture at the end of the post).
It goes like this :
findpeaks <- function(vec,bw=1,x.coo=c(1:length(vec)))
{
pos.x.max <- NULL
pos.y.max <- NULL
pos.x.min <- NULL
pos.y.min <- NULL for(i in 1:(length(vec)-1)) { if((i+1+bw)>length(vec)){
sup.stop <- length(vec)}else{sup.stop <- i+1+bw
}
if((i-bw)<1){inf.stop <- 1}else{inf.stop <- i-bw}
subset.sup <- vec[(i+1):sup.stop]
subset.inf <- vec[inf.stop:(i-1)]
is.max <- sum(subset.inf > vec[i]) == 0
is.nomin <- sum(subset.sup > vec[i]) == 0
no.max <- sum(subset.inf > vec[i]) == length(subset.inf)
no.nomin <- sum(subset.sup > vec[i]) == length(subset.sup)
if(is.max & is.nomin){
pos.x.max <- c(pos.x.max,x.coo[i])
pos.y.max <- c(pos.y.max,vec[i])
}
if(no.max & no.nomin){
pos.x.min <- c(pos.x.min,x.coo[i])
pos.y.min <- c(pos.y.min,vec[i])
}
}
return(list(pos.x.max,pos.y.max,pos.x.min,pos.y.min))
}
Link to original blog post
diff(diff(x))
(or diff(x,differences=2)
: thanks to @ZheyuanLi) essentially computes the discrete analogue of the second derivative, so should be negative at local maxima. The +1
below takes care of the fact that the result of diff
is shorter than the input vector.
edit: added @Tommy's correction for cases where delta-x is not 1...
tt <- c(1,2,3,2,1, 1, 2, 1)
which(diff(sign(diff(tt)))==-2)+1
My suggestion above ( http://statweb.stanford.edu/~tibs/PPC/Rdist/ ) is intended for the case where the data are noisier.
I had some trouble getting the locations to work in previous solutions and came up with a way to grab the minima and maxima directly. The code below will do this and will plot it, marking the minima in green and the maxima in red. Unlike the which.max()
function this will pull all indices of the minima/maxima out of a data frame. The zero value is added in the first diff()
function to account for the missing decreased length of the result that occurs whenever you use the function. Inserting this into the innermost diff()
function call saves from having to add an offset outside of the logical expression. It doesn't matter much, but i feel it's a cleaner way to do it.
# create example data called stockData
stockData = data.frame(x = 1:30, y=rnorm(30,7))
# get the location of the minima/maxima. note the added zero offsets
# the location to get the correct indices
min_indexes = which(diff( sign(diff( c(0,stockData$y)))) == 2)
max_indexes = which(diff( sign(diff( c(0,stockData$y)))) == -2)
# get the actual values where the minima/maxima are located
min_locs = stockData[min_indexes,]
max_locs = stockData[max_indexes,]
# plot the data and mark minima with red and maxima with green
plot(stockData$y, type="l")
points( min_locs, col="red", pch=19, cex=1 )
points( max_locs, col="green", pch=19, cex=1 )
There are some good solutions provided, but it depends on what you need.
Just diff(tt)
returns the differences.
You want to detect when you go from increasing values to decreasing values. One way to do this is provided by @Ben:
diff(sign(diff(tt)))==-2
The problem here is that this will only detect changes that go immediately from strictly increasing to strictly decreasing.
A slight change will allow for repeated values at the peak (returning TRUE
for last occurence of the peak value):
diff(diff(x)>=0)<0
Then, you simply need to properly pad the front and back if you want to detect maxima at the beginning or end of
Here's everything wrapped in a function (including finding of valleys):
which.peaks <- function(x,partial=TRUE,decreasing=FALSE){
if (decreasing){
if (partial){
which(diff(c(FALSE,diff(x)>0,TRUE))>0)
}else {
which(diff(diff(x)>0)>0)+1
}
}else {
if (partial){
which(diff(c(TRUE,diff(x)>=0,FALSE))<0)
}else {
which(diff(diff(x)>=0)<0)+1
}
}
}
@Ben's solution is pretty sweet. It doesn't handle the follwing cases though:
# all these return numeric(0):
x <- c(1,2,9,9,2,1,1,5,5,1) # duplicated points at maxima
which(diff(sign(diff(x)))==-2)+1
x <- c(2,2,9,9,2,1,1,5,5,1) # duplicated points at start
which(diff(sign(diff(x)))==-2)+1
x <- c(3,2,9,9,2,1,1,5,5,1) # start is maxima
which(diff(sign(diff(x)))==-2)+1
Here's a more robust (and slower, uglier) version:
localMaxima <- function(x) {
# Use -Inf instead if x is numeric (non-integer)
y <- diff(c(-.Machine$integer.max, x)) > 0L
rle(y)$lengths
y <- cumsum(rle(y)$lengths)
y <- y[seq.int(1L, length(y), 2L)]
if (x[[1]] == x[[2]]) {
y <- y[-1]
}
y
}
x <- c(1,2,9,9,2,1,1,5,5,1)
localMaxima(x) # 3, 8
x <- c(2,2,9,9,2,1,1,5,5,1)
localMaxima(x) # 3, 8
x <- c(3,2,9,9,2,1,1,5,5,1)
localMaxima(x) # 1, 3, 8
Finding local maxima and minima for a not so easy sequence e.g. 1 0 1 1 2 0 1 1 0 1 1 1 0 1
I would give their positions at (1), 5, 7.5, 11 and (14) for maxima and 2, 6, 9, 13 for minima.
#Position 1 1 1 1 1
# 1 2 3 4 5 6 7 8 9 0 1 2 3 4
x <- c(1,0,1,1,2,0,1,1,0,1,1,1,0,1) #Frequency
# p v p v p v p v p p..Peak, v..Valey
peakPosition <- function(x, inclBorders=TRUE) {
if(inclBorders) {y <- c(min(x), x, min(x))
} else {y <- c(x[1], x)}
y <- data.frame(x=sign(diff(y)), i=1:(length(y)-1))
y <- y[y$x!=0,]
idx <- diff(y$x)<0
(y$i[c(idx,F)] + y$i[c(F,idx)] - 1)/2
}
#Find Peaks
peakPosition(x)
#1.0 5.0 7.5 11.0 14.0
#Find Valeys
peakPosition(-x)
#2 6 9 13
peakPosition(c(1,2,3,2,1,1,2,1)) #3 7