问题
I have written some code in R. This code takes some data and splits it into a training set and a test set. Then, I fit a "survival random forest" model on the training set. After, I use the model to predict observations within the test set.
Due to the type of problem I am dealing with ("survival analysis"), a confusion matrix has to be made for each "unique time" (inside the file "unique.death.time"). For each confusion matrix made for each unique time, I am interested in the corresponding "sensitivity" value (e.g. sensitivity_1001, sensitivity_2005, etc.). I am trying to get all these sensitivity values : I would like to make a plot with them (vs unique death times) and determine the average sensitivity value.
In order to do this, I need to repeatedly calculate the sensitivity for each time point in "unique.death.times". I tried doing this manually and it is taking a long time.
Could someone please show me how to do this with a "loop"?
I have posted my code below:
#load libraries
library(survival)
library(data.table)
library(pec)
library(ranger)
library(caret)
#load data
data(cost)
#split data into train and test
ind <- sample(1:nrow(cost),round(nrow(cost) * 0.7,0))
cost_train <- cost[ind,]
cost_test <- cost[-ind,]
#fit survival random forest model
ranger_fit <- ranger(Surv(time, status) ~ .,
data = cost_train,
mtry = 3,
verbose = TRUE,
write.forest=TRUE,
num.trees= 1000,
importance = 'permutation')
#optional: plot training results
plot(ranger_fit$unique.death.times, ranger_fit$survival[1,], type = 'l', col = 'red') # for first observation
lines(ranger_fit$unique.death.times, ranger_fit$survival[21,], type = 'l', col = 'blue') # for twenty first observation
#predict observations test set using the survival random forest model
ranger_preds <- predict(ranger_fit, cost_test, type = 'response')$survival
ranger_preds <- data.table(ranger_preds)
colnames(ranger_preds) <- as.character(ranger_fit$unique.death.times)
#here is my question:
#get results for some time (time >1001)
prediction <- ranger_preds$'1001' > 0.5 # time has to be in "unique.death.times."
real <- cost_test$time >= 1001
#get confusion matrix and sensitivity for this same time
confusion = confusionMatrix(as.factor(prediction), as.factor(real), positive = 'TRUE')
sensitivity_1001 = confusion$byclass[1]
#now, get the results for a second time
prediction <- ranger_preds$'2005' > 0.5 # for any time in unique.death.times. "2005"
real <- cost_test$time >= 2005
#get confusion matirx and sensitivity for the second time
confusion = confusionMatrix(as.factor(prediction), as.factor(real), positive = 'TRUE')
sensitivity_2005 = confusion$byclass[1]
#question: how do I get the "sensitivity" for all the times in "unique.death.times", the average sensitivity and "plot sensitivity vs unique death times"?
Can someone please help me ?
Thanks
Edit: Answer provided by user "Justin Singh". It seems to have the right idea, but the following error is produced:
sensitivity <- list()
for (time in names(ranger_preds)) {
prediction <- ranger_preds[which(names(ranger_preds) == time)] > 0.5
real <- cost_test$time >= as.numeric(time)
confusion <- confusionMatrix(as.factor(prediction), as.factor(real), positive = 'TRUE')
sensitivity[as.character(i)] <- confusion$byclass[1]
}
Error in confusionMatrix.default(as.factor(prediction), as.factor(real), :
The data must contain some levels that overlap the reference.
回答1:
Assuming that each column name of ranger_preds
takes the form of a numeric, you could have something similar to this:
sensitivity <- list()
for (time in names(ranger_preds)) {
prediction <- ranger_preds[which(names(ranger_preds) == time)] > 0.5
real <- cost_test$time >= as.numeric(time)
confusion <- confusionMatrix(as.factor(prediction), as.factor(real), positive = 'TRUE')
sensitivity[as.character(i)] <- confusion$byclass[1]
}
The idea is we create a list for sensitivity
instead of creating multiple variables, and set an attribute to the corresponding time
in names(range_preds)
i.e. for 2005, we'd get the sensitivity by calling sensitivity$2005
.
I haven't tested this, so there might be errors and it might not be the most efficient - however, hopefully it will lead you in the right direction.
来源:https://stackoverflow.com/questions/65102930/r-how-to-repeatedly-loop-the-results-from-a-function