问题
I want to define my own distribution functions to be used with fitdist or fitdistr function in R. Using fitdist in the fitdistrplus package as an example. I define a customized distribution called sgamma as follows:
dsgamma<-function(x,shape){return(dgamma(x,shape,scale=1));}
qsgamma<-function(p,shape){return(qgamma(p,shape,scale=1));}
psgamma<-function(q,shape){return(pgamma(q,shape,scale=1));}
rsgamma<-function(n,shape){return(rgamma(n,shape,scale=1));}
My question is where I should define these functions.
If the above definitnion and declaration is made in the top environment, then I could call fitdist using this distribution function. In other words, my script test1.R with the following content will run just fine:
rm(list=ls())
require(fitdistrplus);
dsgamma<-function(x,shape){return(dgamma(x,shape,scale=1));}
qsgamma<-function(p,shape){return(qgamma(p,shape,scale=1));}
psgamma<-function(q,shape){return(pgamma(q,shape,scale=1));}
rsgamma<-function(n,shape){return(rgamma(n,shape,scale=1));}
x<-rgamma(100, shape=0.4, scale=1);
zfit<-fitdist(x, distr=dsgamma, start=list(shape=0.3));
Now, if I wrapped the above code in a function, it does not work. See test2.R below:
rm(list=ls())
testfit<-function(x)
{
require(fitdistrplus);
dsgamma<-function(x,shape){return(dgamma(x,shape,scale=1));}
qsgamma<-function(p,shape){return(qgamma(p,shape,scale=1));}
psgamma<-function(q,shape){return(pgamma(q,shape,scale=1));}
rsgamma<-function(n,shape){return(rgamma(n,shape,scale=1));}
zfit<-fitdist(x, distr=dsgamma, start=list(shape=0.3));
return(zfit);
}
x<-rgamma(100, shape=0.4, scale=1);
zfit<-testfit(x);
I got the following error:
Error in fitdist(x, distr = dsgamma, start = list(shape = 0.3)) :
The dsgamma function must be defined
Note that I still get an error if I replace
zfit<-fitdist(x, distr=dsgamma, start=list(shape=0.3));
with
zfit<-fitdist(x, distr="sgamma", start=list(shape=0.3));
I guess the key question is where fitdist look for the function specified by the parameter distr. I would really appreciate your help.
回答1:
Great question. The reason for this error is that the authors of the fitdistrplus
package use exists()
to check for variations of arguments needed by the function.
The following is an excerpt from the code of the fitdist
and mledist
functions. Where the authors take the value given for distr
and search for appropriate density and probability functions in the global environment and the environment where fitdist
and mledist
are defined.
if (!exists(ddistname,mode="function"))
stop(paste("The ", ddistname, " function must be defined"))
pdistname <- paste("p", distname, sep = "")
if (!exists(pdistname,mode="function"))
stop(paste("The ", pdistname, " function must be defined"))
This is an excerpt from how exists works:
This function looks to see if the name ‘x’ has a value bound to it in the specified environment. If ‘inherits’ is ‘TRUE’ and a value is not found for ‘x’ in the specified environment, the enclosing frames of the environment are searched until the name ‘x’ is encountered. See ‘environment’ and the ‘R Language Definition’ manual for details about the structure of environments and their enclosures.
To learn more about why exists is making your function fail check this article: http://adv-r.had.co.nz/Environments.html
Essentially, fitdist and mledist are not searching in the environment of the function you are creating giving you the error that the dsgamma (and the other functions you define) do not exist.
This can be most easily circumvented by using <<-
instead of <-
to define the functions within your testfit(). This will define your child functions globally.
> testfit<-function(x)
+ {
+ require(fitdistrplus);
+ dsgamma<<-function(x,shape){return(dgamma(x,shape,scale=1))}
+ qsgamma<<-function(p,shape){return(qgamma(p,shape,scale=1))}
+ psgamma<<-function(q,shape){return(pgamma(q,shape,scale=1))}
+ rsgamma<<-function(n,shape){return(rgamma(n,shape,scale=1))}
+ zfit<-function(x){return(fitdist(x,distr="sgamma" , start=list(shape=0.3)))};
+ return(zfit(x))
+ }
!> testfit(x)
Fitting of the distribution ' sgamma ' by maximum likelihood
Parameters:
estimate Std. Error
shape 0.408349 0.03775797
You can alter the code of fitdist to search in your function's environment by adding envir=parent.frame() to the exists checks like follows, but I do not recommend this.
if (!exists(ddistname,mode="function",envir=parent.frame()))
However, this still doesn't solve your problem as fitdist
calls mledist
and mledist
has the same problem.
Error in mledist(data, distname, start, fix.arg, ...) (from #43) :
The dsgamma function must be defined
To pursue this approach you will have to alter mledist
as well and tell it to search in the parent.frame of fitdistr
. You will have to make these changes each time you load the library.
来源:https://stackoverflow.com/questions/24934716/where-to-define-distribution-function-to-be-used-with-fitdist-fitdistrplus-or