I have the following R data.table (though this should scale with a data.frame too). The goal is to reshape this data.table to plot as a scatterplot in ggplot2
. I therefore need to reshape this data.table to have one "factor" column to color the points:
> library(data.table)
> dt
ID x_A y_A x_B y_B
1: 05AC 0.81 3 0.92 2.05
2: 01BA 0.41 5 0.63 1.8
3: Z1AC 0.41 5 0.58 1.8
4: B2BA 0.21 6.5 1.00 1.8
....
I believe the correct output needs to be of the form:
ID type x y
05AC A 0.81 3
05AC B 0.92 2.05
01BA A 0.41 5
01BA B 0.63 1.8
Z1AC A 0.41 5
Z1AC B 0.58 1.8
B2BA A 0.21 6.5
B2BA B 1.00 1.8
Is there a standard way to "unfold" data.tables in this fashion? I'm happy for how to use dplyr in this case, but I suspect there should be a data.table method.
melt()
would work, if I could figure out how to create the column type
, e.g.
melt(dt, id.vars=c("ID"))
will only melt based on the one column ID
I'm especially confused how one "scrapes" the A and B type from columns 2-3 and columns 4-5 respectively...
Staying within data.table
, after your suggested approach of using melt
, you can tstrsplit
to split the variable based on the "_" character.
## use tstrsplit to split a column on a regular expression
dt[, c("xy", "type") := tstrsplit(variable, "_")]
dt
# ID variable value xy type
# 1: 05AC x_A 0.81 x A
# 2: 01BA x_A 0.41 x A
# 3: Z1AC x_A 0.41 x A
# 4: B2BA x_A 0.21 x A
# 5: 05AC y_A 3.00 y A
# 6: 01BA y_A 5.00 y A
# 7: Z1AC y_A 5.00 y A
# 8: B2BA y_A 6.50 y A
# 9: 05AC x_B 0.92 x B
# 10: 01BA x_B 0.63 x B
# 11: Z1AC x_B 0.58 x B
# 12: B2BA x_B 1.00 x B
# 13: 05AC y_B 2.05 y B
# 14: 01BA y_B 1.80 y B
# 15: Z1AC y_B 1.80 y B
# 16: B2BA y_B 1.80 y B
This gives you the long-form of your required solution. You can then use dcast
to widen it
dcast(dt, formula = ID + type ~ xy)
# ID type x y
# 1: 01BA A 0.41 5.00
# 2: 01BA B 0.63 1.80
# 3: 05AC A 0.81 3.00
# 4: 05AC B 0.92 2.05
# 5: B2BA A 0.21 6.50
# 6: B2BA B 1.00 1.80
# 7: Z1AC A 0.41 5.00
# 8: Z1AC B 0.58 1.80
The logic of this answer is the same as the suggested dplyr
approach of gather %>% separate %>% spread
, but using data.table
.
A combination of dplyr
and tidyr
can produce your desired result. This is untested, due to the lack of a reproducible example.
library(tidyr)
library(dplyr)
dt %>%
gather(variable, value, -ID) %>%
separate(variable, c("group", "type"), sep = "\\_") %>%
spread(group, value, na.rm = TRUE)
What this does:
- gathers all columns except the ID column into a key-value rows, variable and value.
- separates the variable column into group and type, using
_
as a separator. - spread the contents of the group rows into columns and populate them with the value column, removing any
NA
combinations.
来源:https://stackoverflow.com/questions/46800197/melting-an-r-data-table-with-a-factor-column