Reshaping multiple groups of columns in a data frame from wide to long

问题

I am working with air-quality data. I tried to reshape the data frame from wide to long using melt function. Here is the data: Elev stands for Elevation, Obs for observation and US3, DK1, DE1 are models, where lm and ul represents first and third quantiles.

Elev      Obs       lm       ul      US3       lm       ul      DK1       lm       ul
1    0 37.74289 34.33422 41.27840 38.82037 35.35241 42.30042 49.31111 45.00134 53.90968
2  100 38.14076 34.71842 41.36560 39.82727 36.49086 43.22209 50.46545 45.79068 55.44664
3  250 39.31056 35.98180 42.50011 40.94909 37.70768 44.40232 50.79818 45.76405 55.54795
4  500 41.03098 37.78005 44.02544 42.54909 39.25627 45.72927 51.24182 46.76091 55.88568
5  750 43.57307 40.52575 46.92804 43.48000 40.55918 46.62914 51.90364 47.40586 56.37514
       DE1       lm       ul
1 41.15185 37.81824 44.62509
2 40.89455 37.38491 44.34759
3 40.93455 37.33400 44.32573
4 41.26727 37.90150 44.68568
5 43.04545 40.04541 46.12386

I used

 melt(f,id.vars=c("Elev", "lm","um"),measure.vars=c("US3", "DK1", "DE1","Obs" ))

and I got

Elev       lm       ul      variable    value
   0 34.33422 41.27840           US3 38.82037
 100 34.71842 41.36560           US3 39.82727
 250 35.98180 42.50011           US3 40.94909
 500 37.78005 44.02544           US3 42.54909
 750 40.52575 46.92804           US3 43.48000
   0 34.33422 41.27840           DK1 49.31111
 100 34.71842 41.36560           DK1 50.46545
 250 35.98180 42.50011           DK1 50.79818
 500 37.78005 44.02544           DK1 51.24182
 750 40.52575 46.92804           DK1 51.90364
   0 34.33422 41.27840           DE1 41.15185
 100 34.71842 41.36560           DE1 40.89455
 250 35.98180 42.50011           DE1 40.93455
 500 37.78005 44.02544           DE1 41.26727
 750 40.52575 46.92804           DE1 43.04545
   0 34.33422 41.27840           Obs 37.74289
 100 34.71842 41.36560           Obs 38.14076
 250 35.98180 42.50011           Obs 39.31056
 500 37.78005 44.02544           Obs 41.03098
 750 40.52575 46.92804           Obs 43.57307

As it can clearly be seen the values of lm and ul are repeated for every elevation. How can I have a long format without the repetition of those values? My expected result is:

Elev    lm      ul      variable  value
  0 35.35241 42.30042      US3 38.82037
100 36.49086 43.22209      US3 39.82727
250 37.70768 44.40232      US3 40.94909
500 39.25627 45.72927      US3 42.54909
750 40.55918 46.62914      US3 43.48000
  0 45.00134 53.90968      DK1 49.31111
100 45.79068 55.44664      DK1 50.46545
250 45.76405 55.54795      DK1 50.79818
500 46.76091 55.88568      DK1 51.24182
750 47.40586 56.37514      DK1 51.90364
  0 37.81824 44.62509      DE1 41.15185
100 37.38491 44.34759      DE1 40.89455
250 37.33400 44.32573      DE1 40.93455
500 37.90150 44.68568      DE1 41.26727
750 40.04541 46.12386      DE1 43.04545
  0 34.33422 41.27840      Obs 37.74289
100 34.71842 41.36560      Obs 38.14076
250 35.98180 42.50011      Obs 39.31056
500 37.78005 44.02544      Obs 41.03098
750 40.52575 46.92804      Obs 43.57307

回答1:

If you use a data.table and have your names as: Elev, Obs_va, obs_lm, obs_ul, US3_va, US3_lm, US3_ul, DK1_va, DK1_lm, DK1_ul, DE1_va, DE1_lm, DE1_ul.

Then this code produces the expected result in a very generic way.

temp <- melt(temp, id.vars=c("Elev"))
temp[, `:=`(var = sub("_..$", '', variable),  measure = 
          sub('.*_', '', variable), variable = NULL)]  
dcast( temp[measure!="va",],   ... ~ measure, value.var='value' )

You could also pass the arguments manually instead. Or just split and paste the data.table or data.frame manually in chunks.

And here you have another solution, simpler:

temp2 <- melt(temp, measure.vars=patterns("lm$","ul$"), 
   value.name = c("lm","ul"))[,c("Elev","variable","lm","ul")]
temp2[,"variable"] <- sub("_va","",grep("_va",names(temp), 
   value=T))[temp2$variable]

Where temp is your original data.table.

回答2:

The recent versions of data.table allow to melt multiple columns simultaneously

An additional difficulty is that the data frame contains columns with the same name. Thanks to the patterns() function, it is not required to rename the columns beforehand.

library(data.table) # version 1.10.4 used here

# create vector of the names of data groups - in the order they appear in the DF !
dg_names <- c("Obs", "US3", "DK1", "DE1")

# coerce DF to data.table and melt using the patterns() function to identify columns
molten <- melt(setDT(DF), 
               measure.vars = patterns(paste(dg_names, collapse = "|"), "lm", "ul"), 
               value.name = c("value", "lm", "ul"))

# rename variable column to something meaningful
molten[, variable := factor(variable, labels = dg_names)]

Despite the different order of columns and rows, the result is as expected by the OP:

molten
#    Elev variable    value       lm       ul
# 1:    0      Obs 37.74289 34.33422 41.27840
# 2:  100      Obs 38.14076 34.71842 41.36560
# 3:  250      Obs 39.31056 35.98180 42.50011
# 4:  500      Obs 41.03098 37.78005 44.02544
# 5:  750      Obs 43.57307 40.52575 46.92804
# 6:    0      US3 38.82037 35.35241 42.30042
# 7:  100      US3 39.82727 36.49086 43.22209
# 8:  250      US3 40.94909 37.70768 44.40232
# 9:  500      US3 42.54909 39.25627 45.72927
#10:  750      US3 43.48000 40.55918 46.62914
#11:    0      DK1 49.31111 45.00134 53.90968
#12:  100      DK1 50.46545 45.79068 55.44664
#13:  250      DK1 50.79818 45.76405 55.54795
#14:  500      DK1 51.24182 46.76091 55.88568
#15:  750      DK1 51.90364 47.40586 56.37514
#16:    0      DE1 41.15185 37.81824 44.62509
#17:  100      DE1 40.89455 37.38491 44.34759
#18:  250      DE1 40.93455 37.33400 44.32573
#19:  500      DE1 41.26727 37.90150 44.68568
#20:  750      DE1 43.04545 40.04541 46.12386

Data

DF <- structure(list(Elev = c(0L, 100L, 250L, 500L, 750L), Obs = c(37.74289, 
38.14076, 39.31056, 41.03098, 43.57307), lm = c(34.33422, 34.71842, 
35.9818, 37.78005, 40.52575), ul = c(41.2784, 41.3656, 42.50011, 
44.02544, 46.92804), US3 = c(38.82037, 39.82727, 40.94909, 42.54909, 
43.48), lm = c(35.35241, 36.49086, 37.70768, 39.25627, 40.55918
), ul = c(42.30042, 43.22209, 44.40232, 45.72927, 46.62914), 
    DK1 = c(49.31111, 50.46545, 50.79818, 51.24182, 51.90364), 
    lm = c(45.00134, 45.79068, 45.76405, 46.76091, 47.40586), 
    ul = c(53.90968, 55.44664, 55.54795, 55.88568, 56.37514), 
    DE1 = c(41.15185, 40.89455, 40.93455, 41.26727, 43.04545), 
    lm = c(37.81824, 37.38491, 37.334, 37.9015, 40.04541), ul = c(44.62509, 
    44.34759, 44.32573, 44.68568, 46.12386)), .Names = c("Elev", 
"Obs", "lm", "ul", "US3", "lm", "ul", "DK1", "lm", "ul", "DE1", 
"lm", "ul"), row.names = c(NA, -5L), class = "data.frame")

来源：https://stackoverflow.com/questions/42845436/reshaping-multiple-groups-of-columns-in-a-data-frame-from-wide-to-long

标签

dataframe

reshape

reshape2

melt