Closest value to a specific column in R

前端 未结 4 928
自闭症患者
自闭症患者 2021-02-13 06:26

I would like to find the closest value to column x3 below.

data=data.frame(x1=c(24,12,76),x2=c(15,30,20),x3=c(45,27,15))
data
  x1 x2 x3
1 24 15 45
2 12 30 27
3          


        
4条回答
  •  庸人自扰
    2021-02-13 06:35

    A tidyverse solution:

    data %>%
      rowid_to_column() %>%
      gather(var, val, -c(x3, rowid)) %>%
      mutate(temp = x3 - val) %>%
      group_by(rowid) %>%
      filter(abs(temp) == min(abs(temp))) %>%
      ungroup() %>%
      select(val)
    
        val
      
    1    24
    2    30
    3    20
    

    First, it adds a row ID. Second, it transforms the data from wide to long. Third, it calculates the difference between "x3" and the other variables. Finally, it groups by the row ID and keeps the rows where the absolute difference is the smallest.

    Or:

    data %>%
      rowid_to_column() %>%
      gather(var, val, -c(x3, rowid)) %>%
      mutate(temp = x3 - val) %>%
      group_by(rowid) %>%
      filter(abs(temp) == min(abs(temp))) %>%
      ungroup() %>%
      pull(val)
    
    [1] 24 30 20
    

    Or using an approach originally proposed by @markus (it assumes that your columns are named "x"):

    data %>%
     mutate(temp = paste0("x", max.col(-abs(.[, -3] - .[, 3])))) %>%
     rowwise() %>%
     summarise(val = eval(as.symbol(temp)))
    
        val
      
    1   24.
    2   30.
    3   20.
    

    First, it is assessing the column index of the variable where the absolute difference in regard to "x3" is the smallest and combines it with "x". Then, it evaluates the combination of x and column index as a variable and returns the appropriate value.

    Also borrowing the idea from @markus (not assuming that your columns are named "x"):

    data %>%
     mutate(temp = max.col(-abs(.[, -3] - .[, 3]))) %>%
     rowwise %>%
     mutate(temp = names(.)[[temp]]) %>%
     summarise(val = eval(as.symbol(temp)))
    

    First, it is assessing the column index of the variable where the absolute difference in regard to "x3" is the smallest. Second, it returns the column name based on the column index. Finally, it evaluates it as a variable and returns the appropriate value.

    Or a variant where you can reference the "x3" variable by its name and not by column index (the basic idea still from @markus):

    data %>%
     mutate(temp = max.col(-abs(.[, !grepl("x3", colnames(.))] - .[, grepl("x3", colnames(.))]))) %>% 
     rowwise %>%
     mutate(temp = names(.)[[temp]]) %>%
     summarise(val = eval(as.symbol(temp)))
    

提交回复
热议问题