model-comparison

Best function to compare caret model objects

旧街凉风 提交于 2021-01-28 19:06:36
问题 I have a number of caret model objects using the same data and tuning parameters. For a sanity check I want to see if each method gives me the same model object. (This is all part of a broader plan to run parallel processing and ensure my models are the same.) For example, below, I train 2 different models and want to compare. When I compare the caret objects it returns FALSE. > library(caret) > > set.seed(0) > myControl <- trainControl(method='cv', index=createFolds(iris$Species)) > > set

What is a threshold in a Precision-Recall curve?

寵の児 提交于 2019-12-31 08:54:28
问题 I am aware of the concept of Precision as well as the concept of Recall. But I am finding it very hard to understand the idea of a 'threshold' which makes any P-R curve possible. Imagine I have a model to build that predicts the re-occurrence (yes or no) of cancer in patients using some decent classification algorithm on relevant features. I split my data for training and testing. Lets say I trained the model using the train data and got my Precision and Recall metrics using the test data.

What is a threshold in a Precision-Recall curve?

和自甴很熟 提交于 2019-12-31 08:52:12
问题 I am aware of the concept of Precision as well as the concept of Recall. But I am finding it very hard to understand the idea of a 'threshold' which makes any P-R curve possible. Imagine I have a model to build that predicts the re-occurrence (yes or no) of cancer in patients using some decent classification algorithm on relevant features. I split my data for training and testing. Lets say I trained the model using the train data and got my Precision and Recall metrics using the test data.

Subsetting in dredge (MuMIn) - must include interaction if main effects are present

爱⌒轻易说出口 提交于 2019-12-21 20:36:55
问题 I'm doing some exploratory work where I use dredge{MuMIn}. In this procedure there are two variables that I want to set to be allowed together ONLY when the interaction between them is present, i.e. they can not be present together only as main effects. Using sample data: I want to dredge the model fm1 (disregarding that it probably doesn't make sense). If the variables GNP and Population appear together, they must also include the interaction between them. require(stats); require(graphics) #

Model comparison for breakpoint time series model in R strucchange

五迷三道 提交于 2019-12-11 17:46:00
问题 I want to test whether a time series contains structural changes or not. Using this simulated example creates a series with two breaks after 30 and 80 observations. set.seed(42) sim_data = data.frame(outcome = c(rnorm(30, 10, 1), rnorm(50, 20, 2), rnorm(20, 45, 1))) sim_ts = ts(data = sim_data, start = c(2010, 1), frequency = 12) plot(sim_ts) I use the strucchange R package to determine the number (if any) of break points and model these: library("strucchange") break_points = breakpoints(sim

AIC different between biglm and lm

一曲冷凌霜 提交于 2019-12-10 15:39:53
问题 I have been trying to use biglm to run linear regressions on a large dataset (approx 60,000,000 lines). I want to use AIC for model selection. However I discovered when playing with biglm on smaller datasets that the AIC variables returned by biglm are different from those returned by lm. This even applies to the example in the biglm help. data(trees) ff<-log(Volume)~log(Girth)+log(Height) chunk1<-trees[1:10,] chunk2<-trees[11:20,] chunk3<-trees[21:31,] library(biglm) a <- biglm(ff,chunk1) a

Model selection using glmulti

时光毁灭记忆、已成空白 提交于 2019-12-08 04:13:08
问题 I am attempting to run glmulti to test all possible subsets for model selection. The following is the code that I am trying to use. lmer.glmulti<-function(formula, data, random="", ...){ lmer(paste(deparse(formula),random),data=data, REML=FALSE,...) } glmulti <- glmulti(formula(lmer(transLOT~DielEnd+TidalHeight+Pier+PercentIllumination+WT+BP+Anglers+(1|Transmitter), data=RESIDENCY_FOR_R), fixed.only=TRUE), data=RESIDENCY_FOR_R, level = 1, method = "h", crit = "bic", confsetsize = 5, plotty =

AIC with weighted nonlinear regression (nls)

◇◆丶佛笑我妖孽 提交于 2019-12-07 21:14:15
问题 I encounter some discrepancies when comparing the deviance of a weighted and unweigthed model with the AIC values. A general example (from ‘nls’): DNase1 <- subset(DNase, Run == 1) fm1DNase1 <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), DNase1) This is the unweighted fit, in the code of ‘nls’ one can see that ‘nls’ generates a vector wts <- rep(1, n) . Now for a weighted fit: fm2DNase1 <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), DNase1, weights = rep(1:8, each = 2)) in

Model selection using glmulti

▼魔方 西西 提交于 2019-12-07 01:18:28
I am attempting to run glmulti to test all possible subsets for model selection. The following is the code that I am trying to use. lmer.glmulti<-function(formula, data, random="", ...){ lmer(paste(deparse(formula),random),data=data, REML=FALSE,...) } glmulti <- glmulti(formula(lmer(transLOT~DielEnd+TidalHeight+Pier+PercentIllumination+WT+BP+Anglers+(1|Transmitter), data=RESIDENCY_FOR_R), fixed.only=TRUE), data=RESIDENCY_FOR_R, level = 1, method = "h", crit = "bic", confsetsize = 5, plotty = F, report = F, fitfunc = lmer.glmulti, random="+(1|Transmitter)", intercept=TRUE) A problem arises with

AIC with weighted nonlinear regression (nls)

独自空忆成欢 提交于 2019-12-06 13:48:10
I encounter some discrepancies when comparing the deviance of a weighted and unweigthed model with the AIC values. A general example (from ‘nls’): DNase1 <- subset(DNase, Run == 1) fm1DNase1 <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), DNase1) This is the unweighted fit, in the code of ‘nls’ one can see that ‘nls’ generates a vector wts <- rep(1, n) . Now for a weighted fit: fm2DNase1 <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), DNase1, weights = rep(1:8, each = 2)) in which I assign increasing weights for each of the 8 concentrations with 2 replicates. Now with deviance I