Reshape/gather function to create dataset ready for multilevel analysis

前端未结

关注

 2  1461

I have a big dataset, with 240 cases representing 240 patients. They all have undergone neuropsychological tests and filled in questionnaires. Additionally, their significan

相关标签:

2条回答

忘了有多久

2021-01-14 22:12

If I understand what you want correctly, you can gather everything to a very long form and then reshape back to a slightly wider form:

library(tidyverse)
set.seed(47)    # for reproducibility

mydf <- data.frame(id = c(1:5),
                   p1 = c(sample(1:10, 5)),
                   p2 = c(sample(10:20, 5)),
                   p3 = c(sample(20:30, 5)),
                   pr1 = c(sample(1:10, 5)),
                   pr2 = c(sample(10:20, 5)),
                   pr3 = c(sample(20:30, 5)))

mydf_long <- mydf %>% 
    gather(var, val, -id) %>% 
    separate(var, c('couple', 'q'), -2) %>% 
    mutate(q = paste0('q', q)) %>% 
    spread(q, val)

mydf_long
#>    id couple q1 q2 q3
#> 1   1      p 10 17 21
#> 2   1     pr 10 11 24
#> 3   2      p  4 13 27
#> 4   2     pr  4 15 20
#> 5   3      p  7 14 30
#> 6   3     pr  1 14 29
#> 7   4      p  6 18 24
#> 8   4     pr  8 20 30
#> 9   5      p  9 16 23
#> 10  5     pr  3 18 25

0 讨论(0)

萌比男神i

2021-01-14 22:13

One approach would be to use unite and separate in tidyr, along with the gather function as well.

I'm using your mydf data frame since it was provided, but it should be pretty straightforward to make any changes:

mydf %>% 
  unite(p1:p3, col = `1`, sep = ";") %>% # Combine responses of 'p1' through 'p3'
  unite(pr1:pr3, col = `2`, sep = ";") %>% # Combine responses of 'pr1' through 'pr3'
  gather(couple, value, `1`:`2`) %>% # Form into long data
  separate(value, sep = ";", into = c("q1", "q2", "q3"), convert = TRUE) %>% # Separate and retrieve original answers
  arrange(id)

Which gives you:

   id couple q1 q2 q3
1   1      1  9 18 25
2   1      2 10 18 30
3   2      1  1 11 29
4   2      2  2 15 29
5   3      1 10 19 26
6   3      2  3 19 25
7   4      1  7 10 23
8   4      2  1 20 28
9   5      1  6 16 21
10  5      2  5 12 26

Our numbers are different since they were all randomly generated with sample.

Edited per @alistaire comment: add convert = TRUE to the separate call to make sure the responses are still of class integer.

0 讨论(0)