问题
I am performing an extreme value analysis for meteorological data, to be precise for precipitation data available in mm/d. I am using a threshold excess approach for estimating the parameters of a generalized Pareto distribution with a maximum likelihood method.
The aim is to calculate several return levels (i.e. the 2, 5, 10, 20, 50, 100 year event) for daily precipitation.
While the R code works fine, I am wondering why I get clearly different results when calculating return levels based on the quantiles of the fitted GPD with different packages. Even though the estimated parameters of the GPD are almost identical in each package, the quantiles differ a lot.
The packages I used are: ismev, extRemes, evir and POT.
I guess that the different estimates for the parameters of the GPD are due to different calculation routines, but I do not understand why the calculation of the quantiles differs that much depending on the different packages.
while lmom, evir and POT return the same quanatile values, the return period derived from the extRemes package differs from the other results.
# packages
library(ismev)
library(extRemes)
library(evir)
library(POT)
library(lmom)
th <- 50
# sample data:
potvalues <- c(
58.5,44.2,49.6,59.3,48.3,60.9,94.5,47.1,45.3,57.6,48.2,46.2,44.2,50.6,42.1,52.7,80.9,
58.5,51.3,48.4,51.7,71.9,60.1,64.4,43.5,55.5,49.3,58.2,47.5,43.7,45.2,52.8,42.2,46.4,
96.1,47.5,50.1,42.4,60.9,72.6,51.6,59.4,80.5,63.7,59.9,45.0,66.7,47.6,53.3,43.1,51.0,
46.2,53.6,59.8,51.7,46.7,42.6,44.5,45.0,50.0,44.0,89.9,44.2,47.8,53.3,43.0,55.7,44.6,
44.6,54.9,45.1,43.9,78.7,45.5,64.0,42.7,47.4,57.0,105.4,64.3,43.2,50.4,80.2,49.9,71.6,
47.4,44.1,47.6,55.2,44.4,78.6,50.8,42.4,47.1,43.5,51.4)
#------------------------------------------------------------------------------------------#
# MLE Fitting of GPD - package extRemes
# fit gpd
pot.ext <- fevd(potvalues, method = "MLE", type="GP", threshold=th)
# return levels:
rl.extremes <- return.level(pot.ext, conf = 0.05,
return.period= c(2,5,10,20,50,100))
rl.extremes <- as.numeric(rl.extremes)
#------------------------------------------------------------------------------------------#
# MLE Fitting of GPD - package ismev
pot.gpd <- gpd.fit(potvalues, threshold=th)
s1 <- quagpa(f=.99, para=c(pot.gpd$threshold, pot.gpd$mle[1],-pot.gpd$mle[2])) # 100
s2 <- quagpa(f=.98, para=c(pot.gpd$threshold, pot.gpd$mle[1],-pot.gpd$mle[2])) # 50
s3 <- quagpa(f=.95, para=c(pot.gpd$threshold, pot.gpd$mle[1],-pot.gpd$mle[2])) # 20
s4 <- quagpa(f=.90, para=c(pot.gpd$threshold, pot.gpd$mle[1],-pot.gpd$mle[2])) # 10
s5 <- quagpa(f=.80, para=c(pot.gpd$threshold, pot.gpd$mle[1],-pot.gpd$mle[2])) # 5
s6 <- quagpa(f=.50, para=c(pot.gpd$threshold, pot.gpd$mle[1],-pot.gpd$mle[2])) # 2
rl.ismev <- c(s6, s5, s4, s3, s2, s1)
#------------------------------------------------------------------------------------------#
# MLE Fitting of GPD - package evir
# fit gpd
gpd.evir <- gpd(potvalues, threshold=th)
# plot
evirplot <- plot(gpd.evir)
1 # Excess Distribution
0 # exit
x100 <- gpd.q(pp=.99, x=evirplot) # 100
x050 <- gpd.q(pp=.98, x=evirplot) # 50
x020 <- gpd.q(pp=.95, x=evirplot) # 20
x010 <- gpd.q(pp=.90, x=evirplot) # 10
x005 <- gpd.q(pp=.80, x=evirplot) # 5
x002 <- gpd.q(pp=.50, x=evirplot) # 2
rl.evir <- t(rbind(x002,x005,x010,x020,x050,x100))
rl.evir <- as.numeric(rl.evir[2,])
#------------------------------------------------------------------------------------------#
# MLE Fitting of GPD - package POT
gpd.pot <- fitgpd(potvalues, threshold=th)
quant = c(0.50, 0.80, 0.90, 0.95, 0.98, 0.99)
rtp <- c(2,5,10,20,50,100)
retvec <- vector()
for (i in quant){
x <- POT::qgpd(i, loc = th, scale = as.numeric(gpd.pot$param[1]),
shape = as.numeric(gpd.pot$param[2]))
retvec <- c(retvec,x)
}
rl.pot <- retvec
#------------------------------------------------------------------------------------------#
# comparison of results - return periods
result <- cbind(rl.extremes,rl.ismev, rl.evir, rl.pot)
round(result, 2)
#------------------------------------------------------------------------------------------#
# comparison of estimated parameters
param.extremes <- pot.ext$results$par # extremes
param.ismev <- pot.gpd$mle # ismev
param.evir <- c(gpd.evir$par.ests[2],gpd.evir$par.ests[1]) # evir
param.pot <- gpd.pot$param # POT
parameters <- cbind(param.extremes, param.ismev , param.evir, param.pot)
round(parameters, 4)
#------------------------------------------------------------------------------------------#
回答1:
The solution for this problem is described e.g. in Coles book (An Introduction to Statistical Modeling of Extreme Values, Chapter 4.3.3). While the return levels for a GEV can be derived rather directly from its quantiles, the so called exceedance rate (i.e. number of events per year or the likelihood, that an event exceeds the threshold respectively) has to be considered when calculating return levels for a GP within the scope of a peak over threshold appoach.
The N-year return level is defined by
Thus it does not work to obtain meaningful results for return levels when simply calculating the quantiles for the GP distribution without considering the exceedance rate. The extRemes package takes the exceedance rate into account, while the default value for the number of events per year in the POT and evir packages is set to 1 if unspecified.
回答2:
The differences may also come from the different methods of fitting the distribution function to the dataset. I have a package on CRAN that compares GPD fits (or rather, their quantile estimates) for several R packages and methods:
https://cran.r-project.org/web/packages/extremeStat/vignettes/extremeStat.html
You can also use the package to compare GPD with other distributions.
来源:https://stackoverflow.com/questions/27524131/calculation-of-return-levels-based-on-a-gpd-in-different-r-packages