问题
I am working with air-quality data. I tried to reshape the data frame from wide to long using melt
function. Here is the data: Elev
stands for Elevation
, Obs
for observation
and US3, DK1, DE1
are models, where lm
and ul
represents first and third quantiles.
Elev Obs lm ul US3 lm ul DK1 lm ul
1 0 37.74289 34.33422 41.27840 38.82037 35.35241 42.30042 49.31111 45.00134 53.90968
2 100 38.14076 34.71842 41.36560 39.82727 36.49086 43.22209 50.46545 45.79068 55.44664
3 250 39.31056 35.98180 42.50011 40.94909 37.70768 44.40232 50.79818 45.76405 55.54795
4 500 41.03098 37.78005 44.02544 42.54909 39.25627 45.72927 51.24182 46.76091 55.88568
5 750 43.57307 40.52575 46.92804 43.48000 40.55918 46.62914 51.90364 47.40586 56.37514
DE1 lm ul
1 41.15185 37.81824 44.62509
2 40.89455 37.38491 44.34759
3 40.93455 37.33400 44.32573
4 41.26727 37.90150 44.68568
5 43.04545 40.04541 46.12386
I used
melt(f,id.vars=c("Elev", "lm","um"),measure.vars=c("US3", "DK1", "DE1","Obs" ))
and I got
Elev lm ul variable value
0 34.33422 41.27840 US3 38.82037
100 34.71842 41.36560 US3 39.82727
250 35.98180 42.50011 US3 40.94909
500 37.78005 44.02544 US3 42.54909
750 40.52575 46.92804 US3 43.48000
0 34.33422 41.27840 DK1 49.31111
100 34.71842 41.36560 DK1 50.46545
250 35.98180 42.50011 DK1 50.79818
500 37.78005 44.02544 DK1 51.24182
750 40.52575 46.92804 DK1 51.90364
0 34.33422 41.27840 DE1 41.15185
100 34.71842 41.36560 DE1 40.89455
250 35.98180 42.50011 DE1 40.93455
500 37.78005 44.02544 DE1 41.26727
750 40.52575 46.92804 DE1 43.04545
0 34.33422 41.27840 Obs 37.74289
100 34.71842 41.36560 Obs 38.14076
250 35.98180 42.50011 Obs 39.31056
500 37.78005 44.02544 Obs 41.03098
750 40.52575 46.92804 Obs 43.57307
As it can clearly be seen the values of lm
and ul
are repeated for every elevation. How can I have a long format without the repetition of those values?
My expected result is:
Elev lm ul variable value
0 35.35241 42.30042 US3 38.82037
100 36.49086 43.22209 US3 39.82727
250 37.70768 44.40232 US3 40.94909
500 39.25627 45.72927 US3 42.54909
750 40.55918 46.62914 US3 43.48000
0 45.00134 53.90968 DK1 49.31111
100 45.79068 55.44664 DK1 50.46545
250 45.76405 55.54795 DK1 50.79818
500 46.76091 55.88568 DK1 51.24182
750 47.40586 56.37514 DK1 51.90364
0 37.81824 44.62509 DE1 41.15185
100 37.38491 44.34759 DE1 40.89455
250 37.33400 44.32573 DE1 40.93455
500 37.90150 44.68568 DE1 41.26727
750 40.04541 46.12386 DE1 43.04545
0 34.33422 41.27840 Obs 37.74289
100 34.71842 41.36560 Obs 38.14076
250 35.98180 42.50011 Obs 39.31056
500 37.78005 44.02544 Obs 41.03098
750 40.52575 46.92804 Obs 43.57307
回答1:
If you use a data.table and have your names as: Elev, Obs_va, obs_lm, obs_ul, US3_va, US3_lm, US3_ul, DK1_va, DK1_lm, DK1_ul, DE1_va, DE1_lm, DE1_ul.
Then this code produces the expected result in a very generic way.
temp <- melt(temp, id.vars=c("Elev"))
temp[, `:=`(var = sub("_..$", '', variable), measure =
sub('.*_', '', variable), variable = NULL)]
dcast( temp[measure!="va",], ... ~ measure, value.var='value' )
You could also pass the arguments manually instead. Or just split and paste the data.table or data.frame manually in chunks.
And here you have another solution, simpler:
temp2 <- melt(temp, measure.vars=patterns("lm$","ul$"),
value.name = c("lm","ul"))[,c("Elev","variable","lm","ul")]
temp2[,"variable"] <- sub("_va","",grep("_va",names(temp),
value=T))[temp2$variable]
Where temp is your original data.table.
回答2:
The recent versions of data.table
allow to melt multiple columns simultaneously
An additional difficulty is that the data frame contains columns with the same name. Thanks to the patterns()
function, it is not required to rename the columns beforehand.
library(data.table) # version 1.10.4 used here
# create vector of the names of data groups - in the order they appear in the DF !
dg_names <- c("Obs", "US3", "DK1", "DE1")
# coerce DF to data.table and melt using the patterns() function to identify columns
molten <- melt(setDT(DF),
measure.vars = patterns(paste(dg_names, collapse = "|"), "lm", "ul"),
value.name = c("value", "lm", "ul"))
# rename variable column to something meaningful
molten[, variable := factor(variable, labels = dg_names)]
Despite the different order of columns and rows, the result is as expected by the OP:
molten
# Elev variable value lm ul
# 1: 0 Obs 37.74289 34.33422 41.27840
# 2: 100 Obs 38.14076 34.71842 41.36560
# 3: 250 Obs 39.31056 35.98180 42.50011
# 4: 500 Obs 41.03098 37.78005 44.02544
# 5: 750 Obs 43.57307 40.52575 46.92804
# 6: 0 US3 38.82037 35.35241 42.30042
# 7: 100 US3 39.82727 36.49086 43.22209
# 8: 250 US3 40.94909 37.70768 44.40232
# 9: 500 US3 42.54909 39.25627 45.72927
#10: 750 US3 43.48000 40.55918 46.62914
#11: 0 DK1 49.31111 45.00134 53.90968
#12: 100 DK1 50.46545 45.79068 55.44664
#13: 250 DK1 50.79818 45.76405 55.54795
#14: 500 DK1 51.24182 46.76091 55.88568
#15: 750 DK1 51.90364 47.40586 56.37514
#16: 0 DE1 41.15185 37.81824 44.62509
#17: 100 DE1 40.89455 37.38491 44.34759
#18: 250 DE1 40.93455 37.33400 44.32573
#19: 500 DE1 41.26727 37.90150 44.68568
#20: 750 DE1 43.04545 40.04541 46.12386
Data
DF <- structure(list(Elev = c(0L, 100L, 250L, 500L, 750L), Obs = c(37.74289,
38.14076, 39.31056, 41.03098, 43.57307), lm = c(34.33422, 34.71842,
35.9818, 37.78005, 40.52575), ul = c(41.2784, 41.3656, 42.50011,
44.02544, 46.92804), US3 = c(38.82037, 39.82727, 40.94909, 42.54909,
43.48), lm = c(35.35241, 36.49086, 37.70768, 39.25627, 40.55918
), ul = c(42.30042, 43.22209, 44.40232, 45.72927, 46.62914),
DK1 = c(49.31111, 50.46545, 50.79818, 51.24182, 51.90364),
lm = c(45.00134, 45.79068, 45.76405, 46.76091, 47.40586),
ul = c(53.90968, 55.44664, 55.54795, 55.88568, 56.37514),
DE1 = c(41.15185, 40.89455, 40.93455, 41.26727, 43.04545),
lm = c(37.81824, 37.38491, 37.334, 37.9015, 40.04541), ul = c(44.62509,
44.34759, 44.32573, 44.68568, 46.12386)), .Names = c("Elev",
"Obs", "lm", "ul", "US3", "lm", "ul", "DK1", "lm", "ul", "DE1",
"lm", "ul"), row.names = c(NA, -5L), class = "data.frame")
来源:https://stackoverflow.com/questions/42845436/reshaping-multiple-groups-of-columns-in-a-data-frame-from-wide-to-long