I would like to label rows based on the condition in other rows.
basically, what I look for is if the row is NA
then look for row with non-NA and use its
We can subset the NA
row 'value's and check that with the 'value', 'sd' corresponding to the 'good' 'label, change the logical vector ('i2') to 'good/bad' either with numeric indexing or using ifelse
and assign the output back to the column based on the index ('i1')
i1 <- is.na(df$label)
i2 <- df$value[i1] < abs(df$value[1] + 2 * df$sd_value[1])
df$label[i1] <- c("bad", "good")[(i2 + 1)]
It can be wrapped in a function
f1 <- function(data, lblCol, valCol, sdCol){
i1 <- is.na(df[[lblCol]])
gd <- which(df[[lblCol]] == "good")
i2 <- df[[valCol]][i1] < abs(df[[valCol]][gd] + 2 * df[[sdCol]][gd])
df[[lblCol]][i1] <- c("bad", "good")[(i2 + 1)]
df
}
f1(df, "label", "value", "sd_value")
# value sd_value label
#1 0.5 0.1 good
#2 1.0 0.5 bad
#3 0.6 0.2 good
#4 1.2 0.8 bad
With the updated dataset, we extract the rows where the 'label' is non-NA, arrange
it in ascending order and use that in cut
to cut the 'value' to get the correct 'label'
library(dplyr)
df1 <- df %>%
filter(!is.na(label)) %>%
transmute(label, v1 = value + 2 * sd_value) %>%
arrange(v1)
df %>%
mutate(label = cut(value, breaks = c(-Inf, df1$v1), labels = df1$label))
# value sd_value label
#1 0.5 0.1 good
#2 1.0 0.1 bad
#3 8.0 1.0 beautiful
#4 1.2 0.2 bad
#5 2.4 0.2 dirty
#6 0.4 0.1 good
#7 6.0 0.4 ugly
#8 2.0 0.2 dirty
#9 5.7 0.1 ugly
#10 9.0 0.1 beautiful
Or the same logic in base R
df1 <- transform(na.omit(df), v1 = value + 2 * sd_value)[3:4]
df$label <- cut(df$value, breaks = c(-Inf, df1$v1), labels = df1$label)