conditional labeling in rows

后端 未结 1 693
太阳男子
太阳男子 2021-01-21 17:25

I would like to label rows based on the condition in other rows.

basically, what I look for is if the row is NA then look for row with non-NA and use its

相关标签:
1条回答
  • 2021-01-21 17:50

    We can subset the NA row 'value's and check that with the 'value', 'sd' corresponding to the 'good' 'label, change the logical vector ('i2') to 'good/bad' either with numeric indexing or using ifelse and assign the output back to the column based on the index ('i1')

    i1 <- is.na(df$label)
    i2 <- df$value[i1] < abs(df$value[1] + 2 * df$sd_value[1])
    df$label[i1] <- c("bad", "good")[(i2 + 1)]
    

    It can be wrapped in a function

    f1 <- function(data, lblCol, valCol, sdCol){
         i1 <- is.na(df[[lblCol]])
         gd <- which(df[[lblCol]] == "good")
         i2 <- df[[valCol]][i1] < abs(df[[valCol]][gd] + 2 * df[[sdCol]][gd])
         df[[lblCol]][i1] <- c("bad", "good")[(i2 + 1)]
         df
      }
    
    f1(df, "label", "value", "sd_value")
    #  value sd_value label
    #1   0.5      0.1  good
    #2   1.0      0.5   bad
    #3   0.6      0.2  good
    #4   1.2      0.8   bad
    

    Update

    With the updated dataset, we extract the rows where the 'label' is non-NA, arrange it in ascending order and use that in cut to cut the 'value' to get the correct 'label'

    library(dplyr) 
    df1 <- df %>% 
          filter(!is.na(label)) %>% 
          transmute(label, v1 = value + 2 * sd_value) %>%
          arrange(v1)
    df %>% 
        mutate(label = cut(value, breaks = c(-Inf, df1$v1), labels = df1$label)) 
    #   value sd_value     label
    #1    0.5      0.1      good
    #2    1.0      0.1       bad
    #3    8.0      1.0 beautiful
    #4    1.2      0.2       bad
    #5    2.4      0.2     dirty
    #6    0.4      0.1      good
    #7    6.0      0.4      ugly
    #8    2.0      0.2     dirty
    #9    5.7      0.1      ugly
    #10   9.0      0.1 beautiful
    

    Or the same logic in base R

    df1 <- transform(na.omit(df), v1 = value + 2 * sd_value)[3:4]
    df$label <- cut(df$value,  breaks = c(-Inf, df1$v1), labels = df1$label)
    
    0 讨论(0)
提交回复
热议问题