I have implemented a new statistical model in R and it works in my sandbox, but I would like to make it more standard. A good comparison is lm()
, where I can t
The following code:
library(hints)
hints(class="lm")
will provide all Functions for lm as:
Functions for lm in package ‘base’:
kappa Compute or Estimate the Condition Number of a
Matrix
base-defunct Defunct Functions in Package 'base'
rcond Compute or Estimate the Condition Number of a
Matrix
Functions for lm in package ‘gam’:
deviance.lm Service functions and as yet undocumented
functions for the gam library
Functions for lm in package ‘gdata’:
nobs Compute the Number of Non-missing Observations
Functions for lm in package ‘methods’:
setOldClass Register Old-Style (S3) Classes and Inheritance
Functions for lm in package ‘stats’:
add1 Add or Drop All Possible Single Terms to a
Model
alias Find Aliases (Dependencies) in a Model
anova.lm ANOVA for Linear Model Fits
case.names.lm Case and Variable Names of Fitted Models
cooks.distance.lm Regression Deletion Diagnostics
dfbeta.lm Regression Deletion Diagnostics
dfbetas.lm Regression Deletion Diagnostics
drop1.lm Add or Drop All Possible Single Terms to a
Model
dummy.coef.lm Extract Coefficients in Original Coding
effects Effects from Fitted Model
family.lm Accessing Linear Model Fits
formula.lm Accessing Linear Model Fits
hatvalues.lm Regression Deletion Diagnostics
influence.lm Regression Diagnostics
labels.lm Accessing Linear Model Fits
logLik Extract Log-Likelihood
model.frame.lm Extracting the Model Frame from a Formula or
Fit
model.matrix.lm Construct Design Matrices
plot.lm Plot Diagnostics for an lm Object
print.lm Fitting Linear Models
proj Projections of Models
residuals.lm Accessing Linear Model Fits
rstandard.lm Regression Deletion Diagnostics
rstudent.lm Regression Deletion Diagnostics
summary.lm Summarizing Linear Model Fits
variable.names.lm Case and Variable Names of Fitted Models
vcov Calculate Variance-Covariance Matrix for a
Fitted Model Object
case.names Case and Variable Names of Fitted Models
dummy.coef Extract Coefficients in Original Coding
influence.measures Regression Deletion Diagnostics
lm.influence Regression Diagnostics
lm Fitting Linear Models
lm.fit Fitter Functions for Linear Models
model.frame Extracting the Model Frame from a Formula or
Fit
model.matrix Construct Design Matrices
stats-defunct Defunct Functions in Package 'stats'
lm.glm Some linear and generalized linear modelling
examples from `An Introduction to Statistical
Modelling' by Annette Dobson
Functions for lm in package ‘unknown’:
confint.lm NA
extractAIC.lm NA
qr.lm NA
simulate.lm NA
Functions for lm in package ‘VGAM’:
predict.lm Undocumented and Internally Used Functions and
Classes
Functions for lm in package ‘xtable’:
xtable Create Export Tables
This might be another good source.
Following up on Gavin's answer, I found this page, also on the developer site, with a long list of useful suggestions.
Also, "An R Companion to Applied Regression", by Fox and Weisberg, has a walk-through of some of the key methods, in Chapter 8. I found that by looking for mentions of model frames in various R books. This book also has a reference to the same page on the R developer site.
Put into the object what you think is useful and necessary. I think a more important Question is how do you include this information, as well as how one accesses it.
At a minimum, provide a print()
method so the entire object doesn't get dumped to the screen when you print the object. If you provide a summary()
method, the convention is to have that object return an object of class summary.foo
(where foo
is your class) and then provide a print.summary.foo()
method --- you don't want your summary()
method doing any printing in and of itself.
If you have coefficients, fitted values and residuals and these are simple, then you can store them in your returned object as $coefficients
, $fitted.values
and $residuals
respectively. Then the default methods for coef()
, fitted()
and resid()
will work without you needing to add your own bespoke methods. If these are not simple, then provide your own methods for coef()
, fitted.values()
and residuals()
for your class. By not simple, I mean, for example, if there are several types of residual and you need to process the stored residuals to get the requested type --- then you need your own method that takes a type
argument or similar to select from the available types of residual. See ?residuals.glm
for an example.
If predictions are something that can be usefully provided, then a predict()
method could be provided. Look at the predict.lm()
method for example to see what arguments should be taken. Likewise, an update()
can be provided if it makes sense to update the model by adding/removing terms or altering model parameters.
plot.lm()
gives an example of a method that provides several diagnostics plots of the fitted model. You could model your method on that function to select from a set of predefined diagnostics plots.
If your model has a likelihood, then providing a logLik()
method to compute or extract it from the fitted model object would be standard, deviance()
is another similar function if such a thing is pertinent. For confidence intervals on parameters, confint()
is the standard method.
If you have a formula interface, then formula()
methods can extract it. If you store it in a place that the default method searches for, then your life will be made easier. A simple way to store this is to store the matched call (match.call()
) in the $call
component. Methods to extract the model frame (model.frame()
) and model matrix (model.matrix()
) that are the data and the expanded (factors converted to variables using contrasts, plus any transformations or functions of the model frame data) model matrix are standard extractor functions. Look at examples from standard R modelling functions for ideas on how to store/extract this information.
If you do use a formula interface, try to follow the standard, non-standard evaluation method used in most R model objects that have a formula interface/method. You can find details of that on the R Developer page, in particular the document by Thomas Lumley. This gives plenty of advice on making your function work like one expects an R modelling function to work.
If you follow this paradigm, then extractors like na.action()
should just work if you follow the standard (non-standard) rules.