Calculate differences based on categorical column with tidyverse

狂风中的少年 提交于 2021-01-28 08:25:16

问题


I have the following data frame:

library(tidyverse)

df <- data.frame(
  vars = rep(letters[1:2], 3),
  value = c(10,12,15,19,22,23),
  phase = rep(factor(c("pre","post1","post2"), levels = c("pre","post1","post2")),2)
) %>% 
  arrange(vars,phase)

And I would like to calculate the difference in value of the following:

  • post1 - pre
  • post2 - post1
  • post2 - pre

for each var (i.e., a and b).

What would be the most efficient way of achieving this using tidyverse?

Expected outcome:

 vars         x     diffs
    a   post1 - pre    12
    a post2 - post1    -7
    a   post2 - pre     5
    b   post1 - pre    -7
    b post2 - post1    11
    b   post2 - pre     4

回答1:


You can use spread and gather from tidyr, first to transform phase into columns, and then once difference is calculated to put into long format again:

library(dplyr)
library(tidyr)
df %>%
    spread(phase, value) %>%
    mutate("post1 - pre" = post1 - pre, "post2 - post1" = post2 - post1, "post2 - pre" = post2 - pre) %>%
    select(-pre, -post1, -post2) %>%
    gather("x", "diff", 2:4)



回答2:


Here's a more automated approach that gets all combinations you need after you specify the order that your differences have to follow:

library(tidyverse)

# example dataset
df <- data.frame(
  vars = rep(letters[1:2], 3),
  value = c(10,12,15,19,22,23),
  phase = rep(factor(c("pre","post1","post2"), levels = c("pre","post1","post2")),2)
) %>% 
  arrange(vars,phase)

# set the levels in the right order based on the differences you want to get
df$phase = factor(df$phase, levels = c("post2","post1","pre"))


data.frame(t(combn(as.character(sort(unique(df$phase))), 2)), stringsAsFactors = F) %>%  # create a dataframe of unique combinations of differences you want to investigate
  mutate(vars = list(unique(df$vars))) %>%          # add unique vars as a list
  unnest() %>%                                      # get all combinations
  group_by(id = row_number()) %>%                   # for each row
  nest() %>%                                        # nest data
  mutate(diffs = map(data, ~df$value[df$vars==.$vars & df$phase==.$X1] - 
                            df$value[df$vars==.$vars & df$phase==.$X2]),   # get differences based on corresponding values
         x = map(data, ~paste0(c(.$X1, .$X2), collapse = " - "))) %>%      # create your x column
  unnest() %>%                                      # unnest data
  select(vars, x, diffs)                            # keep relevant columns

# # A tibble: 6 x 3
#   vars  x             diffs
#   <fct> <chr>         <dbl>
# 1 a     post2 - post1    -7
# 2 b     post2 - post1    11
# 3 a     post2 - pre       5
# 4 b     post2 - pre       4
# 5 a     post1 - pre      12
# 6 b     post1 - pre      -7


来源:https://stackoverflow.com/questions/51890466/calculate-differences-based-on-categorical-column-with-tidyverse

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!