问题
I have a large dataset that I need to convert to long format from wide format. That should be simple enough and there are lots of examples of how to do that on this forum. However, in this case, I also need to split the column headers used in the wide format and create a column for each of them in the long format.
Example dataset
data <- data.frame("East2010"=1:3, "West2010"=4:6, "East2011"=7:9, "West2011"=5:7)
data
East.2010 West.2010 East.2011 West.2011
1 1 4 7 5
2 2 5 8 6
3 3 6 9 7
What I want is something like this
Site Year Response
East 2010 1
East 2010 2
East 2010 3
West 2010 4
West 2010 5
West 2010 6
East 2011 7
East 2011 8
East 2011 9
West 2011 5
West 2011 6
West 2011 7
I have looked a lot of examples on this forum that will melt data to convert to long format and others that do a column split at a delimiter, but I have not been able to make the two work together.
回答1:
Here's the "modern day" :-) approach for this:
library(dplyr)
library(tidyr)
data %>%
gather(var, Response, East2010:West2011) %>% ## Makes wide data long
separate(var, c("Site", "Year"), sep = -5) ## Splits up a column
# Site Year Response
# 1 East 2010 1
# 2 East 2010 2
# 3 East 2010 3
# 4 West 2010 4
# 5 West 2010 5
# 6 West 2010 6
# 7 East 2011 7
# 8 East 2011 8
# 9 East 2011 9
# 10 West 2011 5
# 11 West 2011 6
# 12 West 2011 7
The sep = -5
from above says to go from the end of the string backwards five characters and split there. Thus, if you had "North2010" as a possible name, this would still work.
That said, it is more reliable to use a regular expression like @David's, which is also possible within separate
:
data %>%
gather(var, Response, East2010:West2011) %>%
separate(var, c("Site", "Year"),
sep = "(?<=[[:alpha:]])(?=[[:digit:]])",
perl = TRUE)
回答2:
Or (in case the column width is not always constant). Here I use a "lookahead" and "lookbehind" in order to separate characters from digits.
library(reshape2)
data <- melt(data)
temp <- strsplit(as.character(data$variable), "(?<=[[:alpha:]])(?=[[:digit:]])", perl = TRUE)
transform(data, Site = sapply(temp, "[", 1), Year = sapply(temp, "[", 2))
# variable value Site Year
#1 East2010 1 East 2010
#2 East2010 2 East 2010
#3 East2010 3 East 2010
#4 West2010 4 West 2010
#5 West2010 5 West 2010
#6 West2010 6 West 2010
#7 East2011 7 East 2011
#8 East2011 8 East 2011
#9 East2011 9 East 2011
#10 West2011 5 West 2011
#11 West2011 6 West 2011
#12 West2011 7 West 2011
回答3:
Something along these lines would work:
library("plyr")
library("reshape2")
m.data <- melt(data)
m.data <- mutate(m.data, Site=substr(variable, 1,4),
Year=substr(variable, 5,8))
Which would result in:
> m.data
variable value Site Year
1 East2010 1 East 2010
2 East2010 2 East 2010
3 East2010 3 East 2010
4 West2010 4 West 2010
5 West2010 5 West 2010
6 West2010 6 West 2010
7 East2011 7 East 2011
8 East2011 8 East 2011
9 East2011 9 East 2011
10 West2011 5 West 2011
11 West2011 6 West 2011
12 West2011 7 West 2011
来源:https://stackoverflow.com/questions/25272018/split-column-name-and-convert-data-from-wide-to-long-format-in-r