how to subset rows in specific columns based on minimum values in individual columns in a dataframe using R

感情迁移 提交于 2021-01-28 20:10:40

问题


we have a data frame that has 1000's of rows with multiple columns. the sample data frame is presented below

df1 <- data.frame(X = c(7.48, 7.82, 8.15, 8.47, 8.80, 9.20, 9.51, 9.83, 10.13, 10.59, 7.59, 8.06, 8.39, 8.87, 9.26, 9.64, 10.09, 10.48, 10.88, 11.45), 
              Y = c(49.16, 48.78, 48.40, 48.03, 47.65, 47.24, 46.87, 46.51, 46.15, 45.73, 48.70, 48.18, 47.72, 47.20, 46.71, 46.23, 45.72, 45.24, 44.77, 44.23), 
              ID = c("B_1", "B_1", "B_1", "B_1", "B_1", "B_1", "B_1", "B_1", "B_1", "B_1", "B_1_2", "B_1_2", "B_1_2", "B_1_2", "B_1_2", "B_1_2", "B_1_2", "B_1_2", "B_1_2", "B_1_2"), 
              TI = c(191.31, 191.35, 191.39, 191.44, 191.48, 191.52, 191.56, 191.60, 191.64, 191.69, 1349.93, 1349.97, 1350.01, 1350.05, 1350.09, 1350.14, 1350.18, 1350.22, 1350.26, 1350.30),
              X0 = c(0.172, 0.344,0.846,1.335,1.838,2.410,2.89,3.37,3.842,4.46,0.361,0.983,1.545,2.241,2.86,3.47,4.15,4.77,5.388,6.164),
              V2 = c(1.154,0.644,0.141,0.348,0.851,1.423,1.9059,2.3875,2.856,3.475,0.771,0.224,0.596,1.262,1.883,2.493,3.168,3.786,4.402,5.177))

in the dataframe 'df1' we would like to subset rows by ID wise in the columns 1:4 based on minimum values in the 5th column and 6th column respectively.

for instance, in the data frame 'df1', ID "B_1" in 'X0' column, 0.172 is the minimum value and 0.140 is the minimum value in 'V2'th column for the same ID. so we would like to subset 1st row and 3rd row from column 1:4 along with their corresponding X0th and V2th value of the df1 data frame as shown in the below figure. likewise for the ID "B_1_2" also. like variables 'X0', 'V2' variables we have more than 20 variables in my dataset.

the expected output is shown in the below figure

to get the desired output I tried the code as presented below

library(data.table)
df1=as.data.table(df1)
a <- do.call(rbind,
    apply(df1,1,function(i){
     df1[df1[,.I[(X0)==min(X0)],by=ID]$V1]
    })
)

there are issues in the above code. i am looking for the code to get the desired output


回答1:


We can use map to loop over the columns 'X0', 'V2', grouped by 'ID', slice the rows where the value is min for that looped column, bind them together (_dfr) and create the 'd' column with pmin of those columns

library(dplyr)
library(purrr)
nm1 <- names(df1)[5:6]
map_dfr(nm1, ~ df1 %>%
         group_by(ID) %>%
         slice_min(!! rlang::sym(.x))) %>% 
     ungroup %>%
     mutate(d = select(., all_of(nm1)) %>% reduce(pmin))

-output

# A tibble: 4 x 7
#      X     Y ID       TI    X0    V2     d
#  <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
#1  8.47  48.0 B_1    191. 0.134 0.348 0.134
#2  7.59  48.7 B_1_2 1350. 0.361 0.771 0.361
#3  8.15  48.4 B_1    191. 0.846 0.141 0.141
#4  8.06  48.2 B_1_2 1350. 0.983 0.224 0.224
 


来源:https://stackoverflow.com/questions/65345208/how-to-subset-rows-in-specific-columns-based-on-minimum-values-in-individual-col

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!