I have a dataset like this:
CASE_ID = c("C1","C1", "C2","C2", "C2", "C3", "C4")
PERSON_ID = c(1,0,7,8,1,20,7)
PERSON_DIVISION = c("Zone 1", "NA", "Zone 1", "Zone 3", "Zone 1", "Zone 5", "Zone 1")
df <- data.frame(CASE_ID, PERSON_ID, PERSON_DIVISION)
df
That results in:
CASE_ID PERSON_ID PERSON_DIVISION
1 C1 1 Zone 1
2 C1 0 NA
3 C2 7 Zone 1
4 C2 8 Zone 3
5 C2 1 Zone 1
6 C3 20 Zone 5
7 C4 7 Zone 1
And I want to transform it in:
CASE_ID P1_ID P2_ID P3_ID P1_Division P2_Division P3_Division
1 1 0 NA Zone 1 NA NA
2 7 8 1 Zone 1 Zone 3 Zone 1
3 20 NA NA Zone 5 NA NA
4 7 NA NA Zone 1 NA NA
My approach so far has been to melt the data and laters Dcast:
e <- melt(df)
dcast(e, CASE_ID ~ PERSON_DIVISION + variable)
But I am not getting the desired output, instead I am getting:
CASE_ID NA_PERSON_ID Zone 1_PERSON_ID Zone 3_PERSON_ID Zone 5_PERSON_ID
1 C1 1 1 0 0
2 C2 0 2 1 0
3 C3 0 0 0 1
4 C4 0 1 0 0
There are two issues here:
- Your data is already in long format but you have two value columns. The recent versions of
data.table
support multiple value vars indcast()
. - You need unique row ids within each group. Otherwise,
dcast()
will try to aggregate duplicates (usinglength()
by default which explains the output you've got).
Please, try
library(data.table) # version 1.10.4 used here
# coerce to data.table, add unique row numbers for each group
setDT(df)[, rn := rowid(CASE_ID)]
# dcast with multiple value vars
dcast(df, CASE_ID ~ rn, value.var = list("PERSON_ID", "PERSON_DIVISION"))
# CASE_ID PERSON_ID_1 PERSON_ID_2 PERSON_ID_3 PERSON_DIVISION_1 PERSON_DIVISION_2 PERSON_DIVISION_3
#1: C1 1 0 NA Zone 1 NA NA
#2: C2 7 8 1 Zone 1 Zone 3 Zone 1
#3: C3 20 NA NA Zone 5 NA NA
#4: C4 7 NA NA Zone 1 NA NA
This can be written more concisely as a one-liner:
dcast(setDT(df), CASE_ID ~ rowid(CASE_ID), value.var = list("PERSON_ID", "PERSON_DIVISION"))
来源:https://stackoverflow.com/questions/42166025/r-melt-and-dcast