Calculation of return levels based on a GPD in different R packages

梦想的初衷 提交于 2020-01-23 08:30:15

问题


I am performing an extreme value analysis for meteorological data, to be precise for precipitation data available in mm/d. I am using a threshold excess approach for estimating the parameters of a generalized Pareto distribution with a maximum likelihood method.

The aim is to calculate several return levels (i.e. the 2, 5, 10, 20, 50, 100 year event) for daily precipitation.

While the R code works fine, I am wondering why I get clearly different results when calculating return levels based on the quantiles of the fitted GPD with different packages. Even though the estimated parameters of the GPD are almost identical in each package, the quantiles differ a lot.

The packages I used are: ismev, extRemes, evir and POT.

I guess that the different estimates for the parameters of the GPD are due to different calculation routines, but I do not understand why the calculation of the quantiles differs that much depending on the different packages.

while lmom, evir and POT return the same quanatile values, the return period derived from the extRemes package differs from the other results.

# packages
library(ismev)
library(extRemes)
library(evir)
library(POT)
library(lmom)

th <- 50

# sample data:
potvalues <- c(
  58.5,44.2,49.6,59.3,48.3,60.9,94.5,47.1,45.3,57.6,48.2,46.2,44.2,50.6,42.1,52.7,80.9,
  58.5,51.3,48.4,51.7,71.9,60.1,64.4,43.5,55.5,49.3,58.2,47.5,43.7,45.2,52.8,42.2,46.4,
  96.1,47.5,50.1,42.4,60.9,72.6,51.6,59.4,80.5,63.7,59.9,45.0,66.7,47.6,53.3,43.1,51.0,
  46.2,53.6,59.8,51.7,46.7,42.6,44.5,45.0,50.0,44.0,89.9,44.2,47.8,53.3,43.0,55.7,44.6,
  44.6,54.9,45.1,43.9,78.7,45.5,64.0,42.7,47.4,57.0,105.4,64.3,43.2,50.4,80.2,49.9,71.6,
  47.4,44.1,47.6,55.2,44.4,78.6,50.8,42.4,47.1,43.5,51.4)

#------------------------------------------------------------------------------------------#

# MLE Fitting of GPD - package extRemes

# fit gpd
pot.ext <- fevd(potvalues, method = "MLE", type="GP", threshold=th)

# return levels:
rl.extremes <-  return.level(pot.ext, conf = 0.05,
                             return.period= c(2,5,10,20,50,100))
rl.extremes <- as.numeric(rl.extremes)

#------------------------------------------------------------------------------------------#

# MLE Fitting of GPD - package ismev

pot.gpd <- gpd.fit(potvalues, threshold=th)

s1 <- quagpa(f=.99, para=c(pot.gpd$threshold, pot.gpd$mle[1],-pot.gpd$mle[2])) # 100
s2 <- quagpa(f=.98, para=c(pot.gpd$threshold, pot.gpd$mle[1],-pot.gpd$mle[2])) #  50
s3 <- quagpa(f=.95, para=c(pot.gpd$threshold, pot.gpd$mle[1],-pot.gpd$mle[2])) #  20
s4 <- quagpa(f=.90, para=c(pot.gpd$threshold, pot.gpd$mle[1],-pot.gpd$mle[2])) #  10
s5 <- quagpa(f=.80, para=c(pot.gpd$threshold, pot.gpd$mle[1],-pot.gpd$mle[2])) #   5
s6 <- quagpa(f=.50, para=c(pot.gpd$threshold, pot.gpd$mle[1],-pot.gpd$mle[2])) #   2

rl.ismev <- c(s6, s5, s4, s3, s2, s1)

#------------------------------------------------------------------------------------------#

# MLE Fitting of GPD - package evir

# fit gpd
gpd.evir <- gpd(potvalues, threshold=th)

# plot
evirplot <- plot(gpd.evir)
1 # Excess Distribution
0 # exit

x100 <- gpd.q(pp=.99, x=evirplot) # 100
x050 <- gpd.q(pp=.98, x=evirplot) #  50
x020 <- gpd.q(pp=.95, x=evirplot) #  20
x010 <- gpd.q(pp=.90, x=evirplot) #  10
x005 <- gpd.q(pp=.80, x=evirplot) #   5
x002 <- gpd.q(pp=.50, x=evirplot) #   2

rl.evir <- t(rbind(x002,x005,x010,x020,x050,x100))
rl.evir <- as.numeric(rl.evir[2,])

#------------------------------------------------------------------------------------------#

# MLE Fitting of GPD - package POT

gpd.pot <- fitgpd(potvalues, threshold=th)
quant = c(0.50, 0.80, 0.90, 0.95, 0.98, 0.99)
rtp <- c(2,5,10,20,50,100)

retvec <- vector()
for (i in quant){
  x <- POT::qgpd(i, loc = th, scale = as.numeric(gpd.pot$param[1]),
            shape = as.numeric(gpd.pot$param[2]))
  retvec <- c(retvec,x)
}

rl.pot <- retvec

#------------------------------------------------------------------------------------------#
# comparison of results - return periods
result <- cbind(rl.extremes,rl.ismev, rl.evir, rl.pot)
round(result, 2)

#------------------------------------------------------------------------------------------#
# comparison of estimated parameters
param.extremes <- pot.ext$results$par # extremes
param.ismev <- pot.gpd$mle # ismev
param.evir <- c(gpd.evir$par.ests[2],gpd.evir$par.ests[1])  # evir
param.pot <- gpd.pot$param # POT

parameters <- cbind(param.extremes, param.ismev , param.evir, param.pot)
round(parameters, 4)

#------------------------------------------------------------------------------------------#

回答1:


The solution for this problem is described e.g. in Coles book (An Introduction to Statistical Modeling of Extreme Values, Chapter 4.3.3). While the return levels for a GEV can be derived rather directly from its quantiles, the so called exceedance rate (i.e. number of events per year or the likelihood, that an event exceeds the threshold respectively) has to be considered when calculating return levels for a GP within the scope of a peak over threshold appoach.

The N-year return level is defined by

Thus it does not work to obtain meaningful results for return levels when simply calculating the quantiles for the GP distribution without considering the exceedance rate. The extRemes package takes the exceedance rate into account, while the default value for the number of events per year in the POT and evir packages is set to 1 if unspecified.




回答2:


The differences may also come from the different methods of fitting the distribution function to the dataset. I have a package on CRAN that compares GPD fits (or rather, their quantile estimates) for several R packages and methods:

https://cran.r-project.org/web/packages/extremeStat/vignettes/extremeStat.html

You can also use the package to compare GPD with other distributions.



来源:https://stackoverflow.com/questions/27524131/calculation-of-return-levels-based-on-a-gpd-in-different-r-packages

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!