Split column name and convert data from wide to long format in R

百般思念 提交于 2021-01-27 05:14:32

问题


I have a large dataset that I need to convert to long format from wide format. That should be simple enough and there are lots of examples of how to do that on this forum. However, in this case, I also need to split the column headers used in the wide format and create a column for each of them in the long format.

Example dataset

 data <- data.frame("East2010"=1:3, "West2010"=4:6, "East2011"=7:9, "West2011"=5:7)
 data
 East.2010 West.2010 East.2011 West.2011
 1         1         4         7         5
 2         2         5         8         6
 3         3         6         9         7

What I want is something like this

 Site   Year   Response
 East   2010   1
 East   2010   2
 East   2010   3
 West   2010   4
 West   2010   5
 West   2010   6
 East   2011   7
 East   2011   8
 East   2011   9
 West   2011   5
 West   2011   6
 West   2011   7

I have looked a lot of examples on this forum that will melt data to convert to long format and others that do a column split at a delimiter, but I have not been able to make the two work together.


回答1:


Here's the "modern day" :-) approach for this:

library(dplyr)
library(tidyr)
data %>%
  gather(var, Response, East2010:West2011) %>%  ## Makes wide data long
  separate(var, c("Site", "Year"), sep = -5)    ## Splits up a column
#    Site Year Response
# 1  East 2010        1
# 2  East 2010        2
# 3  East 2010        3
# 4  West 2010        4
# 5  West 2010        5
# 6  West 2010        6
# 7  East 2011        7
# 8  East 2011        8
# 9  East 2011        9
# 10 West 2011        5
# 11 West 2011        6
# 12 West 2011        7

The sep = -5 from above says to go from the end of the string backwards five characters and split there. Thus, if you had "North2010" as a possible name, this would still work.

That said, it is more reliable to use a regular expression like @David's, which is also possible within separate:

data %>%
  gather(var, Response, East2010:West2011) %>%
  separate(var, c("Site", "Year"), 
           sep = "(?<=[[:alpha:]])(?=[[:digit:]])", 
           perl = TRUE)



回答2:


Or (in case the column width is not always constant). Here I use a "lookahead" and "lookbehind" in order to separate characters from digits.

library(reshape2)
data <- melt(data)
temp <- strsplit(as.character(data$variable), "(?<=[[:alpha:]])(?=[[:digit:]])", perl = TRUE)
transform(data, Site = sapply(temp, "[", 1), Year = sapply(temp, "[", 2))

#   variable value Site Year
#1  East2010     1 East 2010
#2  East2010     2 East 2010
#3  East2010     3 East 2010
#4  West2010     4 West 2010
#5  West2010     5 West 2010
#6  West2010     6 West 2010
#7  East2011     7 East 2011
#8  East2011     8 East 2011
#9  East2011     9 East 2011
#10 West2011     5 West 2011
#11 West2011     6 West 2011
#12 West2011     7 West 2011



回答3:


Something along these lines would work:

library("plyr")
library("reshape2")
m.data <- melt(data)
m.data <- mutate(m.data, Site=substr(variable, 1,4), 
    Year=substr(variable, 5,8))

Which would result in:

> m.data
   variable value Site Year
1  East2010     1 East 2010
2  East2010     2 East 2010
3  East2010     3 East 2010
4  West2010     4 West 2010
5  West2010     5 West 2010
6  West2010     6 West 2010
7  East2011     7 East 2011
8  East2011     8 East 2011
9  East2011     9 East 2011
10 West2011     5 West 2011
11 West2011     6 West 2011
12 West2011     7 West 2011


来源:https://stackoverflow.com/questions/25272018/split-column-name-and-convert-data-from-wide-to-long-format-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!