问题
I had a similar question here but this one is slightly different.
I would like to return values with matching conditions in another column based on a cut score criterion. If the cut scores are not available in the variable, I would like to grab closest larger value for the first and second cut, and grab the closest smallest value for the third cut. Here is a snapshot of dataset:
ids <- c(1,2,3,4,5,6,7,8,9,10)
scores.a <- c(512,531,541,555,562,565,570,572,573,588)
scores.b <- c(12,13,14,15,16,17,18,19,20,21)
data <- data.frame(ids, scores.a, scores.b)
> data
ids scores.a scores.b
1 1 512 12
2 2 531 13
3 3 541 14
4 4 555 15
5 5 562 16
6 6 565 17
7 7 570 18
8 8 572 19
9 9 573 20
10 10 588 21
cuts <- c(531, 560, 571)
I would like to grab score.b
value corresponding to the first cut score, which is 13
. Then, grab score.b value corresponding to the second cut (560
) score but it is not in the score.a, so I would like to get the score.a value 562
(closest larger
to 560
), and the corresponding value would be 16
. Lastly, for the third cut score (571
), I would like to get 18 which is the corresponding value of the closest smaller
value (570
) to the third cut score.
Here is what I would like to get.
scores.b
cut.1 13
cut.2 16
cut.3 18
Any thoughts? Thanks
回答1:
data %>%
mutate(cts = Hmisc::cut2(scores.a, cuts = cuts)) %>%
group_by(cts) %>%
summarise( mn = min(scores.b),
mx = max(scores.b)) %>%
slice(-c(1,4)) %>% unlist() %>% .[c(3,4,6)] %>%
data.frame() %>%
magrittr::set_colnames("scores.b") %>%
magrittr::set_rownames(c("cut.1", "cut.2", "cut.3"))
scores.b
cut.1 13
cut.2 16
cut.3 18
回答2:
Using tidyverse
:
data %>%
mutate(cuts_new = cut(scores.a, breaks = c(531,560,570, 1000), right = F)) %>%
group_by(cuts_new) %>% summarise(first_sb = first(scores.b)) %>%
ungroup()
results in:
# A tibble: 4 x 2
cuts_new first_sb
<fct> <dbl>
1 [531,560) 13
2 [560,570) 16
3 [570,1e+03) 18
4 NA 12
来源:https://stackoverflow.com/questions/59882916/subset-values-with-matching-criteria-in-r