Convert numeric representation of 'variable' column to original string following melt using patterns

我的梦境 提交于 2019-11-26 07:44:24

问题


I am using the patterns() argument in data.table::melt() to melt data that has columns that have several easily-defined patterns. It is working, but I\'m not seeing how I can create a character index variable instead of the default numeric breakdown.

For example, in A the dog and cat columns are numbered... take a look at at the \"variable\" column:

A = data.table(idcol = c(1:5),
            dog_1 = c(1:5),   cat_1 = c(101:105),
            dog_2 = c(6:10),  cat_2 = c(106:110),
            dog_3 = c(11:15), cat_3 = c(111:115))   
head(melt(A, measure = patterns(\"^dog\", \"^cat\"), value.name = c(\"dog\", \"cat\")))

   idcol variable dog cat
1:     1        1   1 101
2:     2        1   2 102
3:     3        1   3 103
4:     4        1   4 104
5:     5        1   5 105
6:     1        2   6 106

However, in B the dog and cat columns are numbered with text, but the \"variable\" column is still numeric.

B = data.table(idcol = c(1:5),
                dog_one = c(1:5),     cat_one = c(101:105),
                dog_two = c(6:10),    cat_two = c(106:110),
                dog_three = c(11:15), cat_three = c(111:115))
head(melt(B, measure = patterns(\"^dog\", \"^cat\"), value.name = c(\"dog\", \"cat\")))

   idcol variable dog cat
1:     1        1   1 101
2:     2        1   2 102
3:     3        1   3 103
4:     4        1   4 104
5:     5        1   5 105
6:     1        2   6 106

How can I fill the \"variable\" column with the one/two/three instead of 1/2/3?


回答1:


There might be easier ways, but this seems to work:

# grab suffixes of 'variable' names
suff <- unique(sub('^.*_', '', names(B[ , -1])))
# suff <- unique(tstrsplit(names(B[, -1]), "_")[[2]])

# melt
B2 <- melt(B, measure = patterns("^dog", "^cat"), value.name = c("dog", "cat"))

# replace factor levels in 'variable' with the suffixes
setattr(B2$variable, "levels", suff)

B2
#     idcol variable dog cat
# 1:      1      one   1 101
# 2:      2      one   2 102
# 3:      3      one   3 103
# 4:      4      one   4 104
# 5:      5      one   5 105
# 6:      1      two   6 106
# 7:      2      two   7 107
# 8:      3      two   8 108
# 9:      4      two   9 109
# 10:     5      two  10 110
# 11:     1    three  11 111
# 12:     2    three  12 112
# 13:     3    three  13 113
# 14:     4    three  14 114
# 15:     5    three  15 115

Two related data.table issues:

melt.data.table should offer variable to match on the name, rather than the number

FR: expansion of melt functionality for handling names of output.


This is one of the (rare) instances where I believe good'ol base::reshape is cleaner. Its sep argument comes in handy here — both the names of the 'value' column and the levels of the 'variable' columns are generated in one go:

reshape(data = B,
        varying = names(B[ , -1]),
        sep = "_",
        direction = "long")


来源:https://stackoverflow.com/questions/41883573/convert-numeric-representation-of-variable-column-to-original-string-following

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!