median of selected rows dependent on other columns values [closed]

问题

I have the following data frame (here just a tiny part from a big one)

ID= c(1,1,1,2,2,2,2,3,3)
week = c(1,1,2,1,1,2,2,1,2)
X = c(3.3,4.23,5.6,12,3.1,4.3,5.9,6.1,5.3)
Y = c(1.3,2.4,6.8,5.5,4.3,3,6.6,2.6,5.7)
TS_DF = data.frame(ID,week,X,Y)

I would like to calculated the median of X and Y separately for each ID and week so that the results reads like this

ID    week  X     Y     weekMedX    weekMedY
1     1     3.3   1.3   3.765       1.85
1     1     4.23  2.4   3.765       1.85
1     2     5.6   6.8   5.6         6.8
2     1     12    5.5   7.55        4.9
2     1     3.1   4.3   7.55        4.9
2     2     4.3   3     5.1         4.8
2     2     5.9   6.6   5.1         4.8
3     1     6.1   2.6   6.1         2.6
3     2     5.3   5.7   5.3         5.7

Based on this discusssion I came up with the following code

b = TS_DF %>%
  group_by(ID) %>%
  group_by(week) %>%
  summarise(median = median(X))

but I get wrong results

# A tibble: 2 x 2
week median
<dbl>  <dbl>
1     1   4.23
2     2   5.45

Any ideas would be very appreciated. M

回答1:

As the commentators suggested, this should work:

b = TS_DF %>%
  group_by(ID, week)  %>%
  mutate(median_X = median(X), median_Y = median(Y))

回答2:

If you went the summarise route, you can use a join to bring all the data together.

median_df = TS_DF %>%
  group_by(ID, week) %>%
  summarise(median = median(X))

final_df <- left_join(TS_DF, median_df, by = c('ID', 'week'))

This should give you the original dataframe plus the calculated medians.

回答3:

As some commenters have already mentioned:

Use only one group_by() expression:

library(dplyr)
TS_DF %>% 
  group_by(ID, week) %>% 
  summarise(median_X = median(X),
            median_Y = median(Y))

Otherwise only the last group_by() is used. See also the output of

TS_DF %>%
  group_by(ID, week)

A tibble: 9 x 4 Groups: ID, week [6]

versus the output of:

TS_DF %>%
  group_by(ID) %>%
  group_by(week)

A tibble: 9 x 4 Groups: week [2]

来源：https://stackoverflow.com/questions/60059246/median-of-selected-rows-dependent-on-other-columns-values

标签

median