Reshape data frame from wide to long with re-occuring column names in R

♀尐吖头ヾ 提交于 2019-12-05 22:07:24

Here is a solution using tidyr instead of reshape2. One of the advantages is the gather_ function, which takes character vectors as inputs. So, first we can replace all the "problematic" variable names with unique names (by adding numbers to the end of each name) and then we can gather (the equivalent of melt) these specific variables. The unique names of the variables are stored in a temporary variable called "prob_var_name", which I removed at the end.

library(tidyr)
library(dplyr)

var_name <- "interaction.num"

problem_var <- df %>% 
  names %>% 
  equals(var_name) %>%
  which

replaced_names <- mapply(paste0,names(df)[problem_var],seq_along(problem_var))

names(df)[problem_var]  <- replaced_names

df %>%
  gather_("prob_var_name",var_name,replaced_names) %>%
  select(-prob_var_name)

  conversion.id conversion interaction.num
1             1          1               1
2             2          1               1
3             3          1               1
4             1          1               2
5             2          1               2
6             3          1               2

Thanks to the quoting ability of gather_, you could wrap all this into a function and set var_name to a variable. Then maybe you could use it on all of your duplicated variables?

Here's a solution using data.table. You just have to provide the index instead of names.

require(data.table)
require(reshape2)
ans <- melt(setDT(df), measure=2:3, 
           value.name="interaction.num")[, variable := NULL]

#    conversion.id conversion interaction.num
# 1:             1          1               1
# 2:             2          1               1
# 3:             3          1               1
# 4:             1          1               2
# 5:             2          1               2
# 6:             3          1               2

You can get the indices 2:3 by doing grep("interaction.num", names(df)).

Here's an approach in base R that should work for you:

x <- grep("interaction.num", names(df)) ## as suggested by Arun

## Make more friendly names for reshape
names(df)[x] <- paste(names(df)[x], seq_along(x), sep = "_")

## Reshape
reshape(df, direction = "long", 
        idvar=c("conversion.id", "conversion"), 
        varying = x, sep = "_")
#       conversion.id conversion time interaction.num
# 1.1.1             1          1    1               1
# 2.1.1             2          1    1               1
# 3.1.1             3          1    1               1
# 1.1.2             1          1    2               2
# 2.1.2             2          1    2               2
# 3.1.2             3          1    2               2

Another possibility is stack instead of reshape:

x <- grep("interaction.num", names(df)) ## as suggested by Arun
cbind(df[-x], stack(lapply(df[x], as.character)))

The lapply(df[x], as.character) may not be necessary depending on if your values are actually numeric or not. The way you created this sample data, they were factors.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!