问题
we have a data frame that has 1000's of rows with multiple columns. the sample data frame is presented below
df1 <- data.frame(X = c(7.48, 7.82, 8.15, 8.47, 8.80, 9.20, 9.51, 9.83, 10.13, 10.59, 7.59, 8.06, 8.39, 8.87, 9.26, 9.64, 10.09, 10.48, 10.88, 11.45),
Y = c(49.16, 48.78, 48.40, 48.03, 47.65, 47.24, 46.87, 46.51, 46.15, 45.73, 48.70, 48.18, 47.72, 47.20, 46.71, 46.23, 45.72, 45.24, 44.77, 44.23),
ID = c("B_1", "B_1", "B_1", "B_1", "B_1", "B_1", "B_1", "B_1", "B_1", "B_1", "B_1_2", "B_1_2", "B_1_2", "B_1_2", "B_1_2", "B_1_2", "B_1_2", "B_1_2", "B_1_2", "B_1_2"),
TI = c(191.31, 191.35, 191.39, 191.44, 191.48, 191.52, 191.56, 191.60, 191.64, 191.69, 1349.93, 1349.97, 1350.01, 1350.05, 1350.09, 1350.14, 1350.18, 1350.22, 1350.26, 1350.30),
X0 = c(0.172, 0.344,0.846,1.335,1.838,2.410,2.89,3.37,3.842,4.46,0.361,0.983,1.545,2.241,2.86,3.47,4.15,4.77,5.388,6.164),
V2 = c(1.154,0.644,0.141,0.348,0.851,1.423,1.9059,2.3875,2.856,3.475,0.771,0.224,0.596,1.262,1.883,2.493,3.168,3.786,4.402,5.177))
in the dataframe 'df1' we would like to subset rows by ID wise in the columns 1:4 based on minimum values in the 5th column and 6th column respectively.
for instance, in the data frame 'df1', ID "B_1" in 'X0' column, 0.172 is the minimum value and 0.140 is the minimum value in 'V2'th column for the same ID. so we would like to subset 1st row and 3rd row from column 1:4 along with their corresponding X0th and V2th value of the df1 data frame as shown in the below figure. likewise for the ID "B_1_2" also. like variables 'X0', 'V2' variables we have more than 20 variables in my dataset.
the expected output is shown in the below figure
to get the desired output I tried the code as presented below
library(data.table)
df1=as.data.table(df1)
a <- do.call(rbind,
apply(df1,1,function(i){
df1[df1[,.I[(X0)==min(X0)],by=ID]$V1]
})
)
there are issues in the above code. i am looking for the code to get the desired output
回答1:
We can use map
to loop over the columns 'X0', 'V2', grouped by 'ID', slice the rows where the value is min
for that looped column, bind them together (_dfr
) and create the 'd' column with pmin
of those columns
library(dplyr)
library(purrr)
nm1 <- names(df1)[5:6]
map_dfr(nm1, ~ df1 %>%
group_by(ID) %>%
slice_min(!! rlang::sym(.x))) %>%
ungroup %>%
mutate(d = select(., all_of(nm1)) %>% reduce(pmin))
-output
# A tibble: 4 x 7
# X Y ID TI X0 V2 d
# <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
#1 8.47 48.0 B_1 191. 0.134 0.348 0.134
#2 7.59 48.7 B_1_2 1350. 0.361 0.771 0.361
#3 8.15 48.4 B_1 191. 0.846 0.141 0.141
#4 8.06 48.2 B_1_2 1350. 0.983 0.224 0.224
来源:https://stackoverflow.com/questions/65345208/how-to-subset-rows-in-specific-columns-based-on-minimum-values-in-individual-col