问题
I have a sample of a dataset that needs to be cast into a wide format, but I have a particular issue that I haven't seen addressed on StackOveflow yet.
The column that I'd like to use to make a long dataset has unique values for every single row, but I want to create a new dataset so that are n variables for n attributes for each idvar.
I need to convert this:
state sector attribute_value
alabama 1 a
alabama 1 b
alabama 1 c
alabama 1 d
alabama 1 e
alabama 1 f
alabama 1 g
alabama 1 h
alaska 1 i
alaska 1 j
alaska 1 k
alaska 1 l
alaska 1 m
alaska 1 n
alaska 1 o
arizona 1 p
arizona 1 q
arizona 1 r
arizona 1 s
arizona 1 t
arizona 1 u
arizona 1 v
into:
state sector attribute_value_1 attribute_value_2 attribute_value_3 attribute_value_4 attribute_value_5 attribute_value_6 attribute_value_7 attribute_value_8
alabama 1 a b c d e f g h
alaska 1 i j k l m n o n/a
arizona 1 p q r s t u v n/a
So far, I haven't been able to use dcast or reshape to create this particular transformation.
回答1:
With:
library(data.table)
dcast(setDT(df),
state + sector ~ rowid(state, prefix = 'attr_val_'),
value.var = 'attribute_value')
you get:
state sector attr_val_1 attr_val_2 attr_val_3 attr_val_4 attr_val_5 attr_val_6 attr_val_7 attr_val_8
1: alabama 1 a b c d e f g h
2: alaska 1 i j k l m n o NA
3: arizona 1 p q r s t u v NA
回答2:
Try this , using dplyr
and splitstackshape
library(dplyr)
library(splitstackshape)
df=df%>%group_by(state,sector)%>%dplyr::summarise(attribute_value=paste(attribute_value,sep=',',collapse = ","))
concat.split(df, 3, drop = TRUE)
state sector attribute_value_1 attribute_value_2 attribute_value_3 attribute_value_4 attribute_value_5 attribute_value_6 attribute_value_7 attribute_value_8
1: alabama 1 a b c d e f g h
2: alaska 1 i j k l m n o NA
3: arizona 1 p q r s t u v NA
回答3:
With dplyr
and tidyr
. The trick is to set up a dummy variable (here called ind
) and use this to convert to wide format.
df2 <- df %>% group_by(state, sector) %>%
mutate(ind=paste0("Attribute_", seq_along(attribute_value))) %>%
ungroup() %>%
spread(key=ind, value=attribute_value)
df2
# A tibble: 3 x 10
state sector Attribute_1 Attribute_2 Attribute_3 Attribute_4 Attribute_5 Attribute_6 Attribute_7 Attribute_8
* <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 alabama 1 a b c d e f g h
2 alaska 1 i j k l m n o <NA>
3 arizona 1 p q r s t u v <NA>
来源:https://stackoverflow.com/questions/44958261/want-to-cast-unique-values-into-first-second-third-variables