R Reshape data by combining common value of two variables

◇◆丶佛笑我妖孽 提交于 2019-12-11 01:47:31

问题


I want to reshape a data frame by combining two variables. For example:

Here is a new data:

    dat = data.frame(
    var1 = c("a", "a", "a", "Emily", "b", "Bob", "c"),
    var2 = c("Jhon", "Emily", "Julie", "Angela", "Bob", "Paul", "Paul"),
    stringsAsFactors = F
)

Excepted output:

  #   var1   var2   var3   var4   var5 
  # 1    a   Jhon  Emily  Julie Angela
  # 2    b    Bob   Paul      c   <NA>

回答1:


Using base R you can do:

relation=function(dat){

 .relation=function(x){
    k = unique(sort(c(dat[dat[, 1] %in% x, 2], x, dat[dat[, 2] %in% x, 1])))
    if(setequal(x,k)) toString(k) else .relation(k)}

  grp = sapply(unique(dat[,1]), .relation)
  read.table(text = unique(grp), fill=T, sep=",")
}

relation(dat)
  V1      V2     V3    V4     V5
1  a  Angela  Emily  Jhon  Julie
2  b     Bob      c  Paul   



回答2:


dat = data.frame(var1 = c("a", "a", "a", "Emily", "b", "Bob"), 
                 var2 = c("Jhon", "Emily", "Julie", "Angela", "Bob", "Paul"))

library(igraph)
g <- graph_from_data_frame(dat)
plot(g)
starts <- V(g)[degree(g, mode = "in") == 0] 
finals <- V(g)[degree(g, mode = "out") == 0]
res <- lapply(starts, function(x) unique(names(unlist(all_simple_paths(g, 
                                                            from = x, 
                                                            to = finals, 
                                                            mode = "out")))))
res

# matrix/data frame (?)
max_len <- max(sapply(res, length))
data.frame(do.call(rbind, lapply(res, function(x) c(x, rep(NA, max_len - length(x))))))



回答3:


I have produced a solution which first "cleans" the data structure and then reshapes it using dcast.

library(data.table)

dt.dat <- data.table(dat)

# Cleaning the dataset by adding the persons not assigned to a group by the connection over names
dt.dat.complete <- 
rbindlist(list(dt.dat[!(var1 %in% merge(dt.dat, dt.dat, by.x = "var2", by.y = "var1")[,var2]),]
          , 
merge(dt.dat, dt.dat, by.x = "var2", by.y = "var1")[, .(var1, var2.y)]
))

# Add sequence for the column names
dt.dat.complete[,seq := seq_len(.N), 
                    by=var1]

dcast.data.table(dt.dat.complete, var1  ~ paste0("col",seq) + seq,fun.aggregate = NULL, value.var = "var2")

     var1 col1_1 col2_2 col3_3 col4_4
1:    a   Jhon  Emily  Julie Angela
2:    b    Bob   Paul   <NA>   <NA>


来源:https://stackoverflow.com/questions/51357270/r-reshape-data-by-combining-common-value-of-two-variables

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!