conditional labeling in rows

后端未结

关注

 1  693

太阳男子

I would like to label rows based on the condition in other rows.

basically, what I look for is if the row is NA then look for row with non-NA and use its

相关标签:

1条回答

萌比男神i

2021-01-21 17:50

We can subset the NA row 'value's and check that with the 'value', 'sd' corresponding to the 'good' 'label, change the logical vector ('i2') to 'good/bad' either with numeric indexing or using ifelse and assign the output back to the column based on the index ('i1')

i1 <- is.na(df$label)
i2 <- df$value[i1] < abs(df$value[1] + 2 * df$sd_value[1])
df$label[i1] <- c("bad", "good")[(i2 + 1)]

It can be wrapped in a function

f1 <- function(data, lblCol, valCol, sdCol){
     i1 <- is.na(df[[lblCol]])
     gd <- which(df[[lblCol]] == "good")
     i2 <- df[[valCol]][i1] < abs(df[[valCol]][gd] + 2 * df[[sdCol]][gd])
     df[[lblCol]][i1] <- c("bad", "good")[(i2 + 1)]
     df
  }

f1(df, "label", "value", "sd_value")
#  value sd_value label
#1   0.5      0.1  good
#2   1.0      0.5   bad
#3   0.6      0.2  good
#4   1.2      0.8   bad

Update

With the updated dataset, we extract the rows where the 'label' is non-NA, arrange it in ascending order and use that in cut to cut the 'value' to get the correct 'label'

library(dplyr) 
df1 <- df %>% 
      filter(!is.na(label)) %>% 
      transmute(label, v1 = value + 2 * sd_value) %>%
      arrange(v1)
df %>% 
    mutate(label = cut(value, breaks = c(-Inf, df1$v1), labels = df1$label)) 
#   value sd_value     label
#1    0.5      0.1      good
#2    1.0      0.1       bad
#3    8.0      1.0 beautiful
#4    1.2      0.2       bad
#5    2.4      0.2     dirty
#6    0.4      0.1      good
#7    6.0      0.4      ugly
#8    2.0      0.2     dirty
#9    5.7      0.1      ugly
#10   9.0      0.1 beautiful

Or the same logic in base R

df1 <- transform(na.omit(df), v1 = value + 2 * sd_value)[3:4]
df$label <- cut(df$value,  breaks = c(-Inf, df1$v1), labels = df1$label)

0 讨论(0)