How to create relational matrix in R?

问题

Original data:

df <- structure(list(ID_client = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("1_", "2_", "3_", "4_"), class = "factor"), Connected = c(1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L), Year = c(2010L, 2010L, 2010L, 2010L, 2015L, 2015L, 2015L, 2015L)), class = "data.frame", row.names = c(NA, -8L))

Original data:

`ID_client Connected  Year
1_            1      2010
2_            1      2010
3_            1      2010
4_            0      2010
1_            1      2015
2_            0      2015
3_            1      2015
4_            0      2015`

My intention is to create the following data:

`Year ID_client    1_   2_   3_   4_
2010     1_       0    1    1    0
2010     2_       1    0    1    0
2010     3_       1    1    0    0
2010     4_       0    0    0    0
2015     1_       0    0    1    0
2015     2_       0    0    0    0
2015     3_       1    0    0    0
2015     4_       0    0    0    0`

In other words, a matrix that express that in, for instance, 2010 clients 1_, 2_, and 3_ were connected, while the other one was not. Importantly, I do not consider someone to be connected with herself.

I have tried the following code:

df %>%
  group_by(Year, Connected) %>%
  mutate(temp = rev(ID_client)) %>%
  pivot_wider(names_from = ID_client, 
          values_from = Connected, 
          values_fill = list(Connected = 0)) %>%
  arrange(Year, temp)

This code does not reproduce what I need. Instead, this is the result:

`Year ID_client    1_   2_   3_   4_
2010     1_       0    0    1    0
2010     2_       0    1    0    0
2010     3_       1    0    0    0
2010     4_       0    0    0    0
2015     1_       0    0    1    0
2015     2_       0    0    0    0
2015     3_       1    0    0    0
2015     4_       0    0    0    0`

回答1:

We can group_by Year and create a new column with ID_client values which has Connected == 1 in each group except for the current value. We complete the missing levels and then cast the data to wide format.

library(tidyverse)

df %>%
  group_by(Year) %>%
  mutate(temp = map(ID_client, ~setdiff(ID_client[Connected == 1], .x))) %>%
  unnest(cols = temp) %>%
  complete(temp = unique(ID_client), fill = list(Connected = 0)) %>%
  mutate(ID_client  = coalesce(as.character(ID_client), temp)) %>%
  pivot_wider(names_from = temp, 
              values_from = Connected, 
              values_fill = list(Connected = 0)) %>%
  arrange(Year, ID_client)

#   Year ID_client  `1_`  `2_`  `3_`  `4_`
#  <int> <chr>     <dbl> <dbl> <dbl> <dbl>
#1  2010 1_            0     1     1     0
#2  2010 2_            1     0     1     0
#3  2010 3_            1     1     0     0
#4  2010 4_            0     0     0     0
#5  2015 1_            0     0     1     0
#6  2015 2_            0     0     0     0
#7  2015 3_            1     0     0     0
#8  2015 4_            0     0     0     0

回答2:

You can use a self-join, i.e. an inner join of the data to itself. Join by the pieces of information that mark off a combination of clients: this would be the values in Year and Connected. Since your desired output has zeroes on its diagonal, filter to remove cases where the two IDs are the same.

As you can see, I haven't yet transitioned to the pivot_wider version of tidyr, but this should be adaptable. In spread, specify that unused factor levels shouldn't be dropped so you don't lose the ID 4.

library(dplyr)
library(tidyr)

inner_join(df, df, by = c("Year", "Connected")) %>%
  filter(Connected == 1, ID_client.x != ID_client.y) %>%
  spread(key = ID_client.y, value = Connected, fill = 0, drop = F) %>%
  arrange(Year) 
#>   ID_client.x Year 1_ 2_ 3_ 4_
#> 1          1_ 2010  0  1  1  0
#> 2          2_ 2010  1  0  1  0
#> 3          3_ 2010  1  1  0  0
#> 4          4_ 2010  0  0  0  0
#> 5          1_ 2015  0  0  1  0
#> 6          2_ 2015  0  0  0  0
#> 7          3_ 2015  1  0  0  0
#> 8          4_ 2015  0  0  0  0

来源：https://stackoverflow.com/questions/58160385/how-to-create-relational-matrix-in-r

标签

matrix

reshape