Want to cast unique values into first/second/third variables

扶醉桌前 提交于 2020-01-01 19:18:11

问题


I have a sample of a dataset that needs to be cast into a wide format, but I have a particular issue that I haven't seen addressed on StackOveflow yet.

The column that I'd like to use to make a long dataset has unique values for every single row, but I want to create a new dataset so that are n variables for n attributes for each idvar.

I need to convert this:

state   sector  attribute_value
alabama 1   a
alabama 1   b
alabama 1   c
alabama 1   d
alabama 1   e
alabama 1   f
alabama 1   g
alabama 1   h
alaska  1   i
alaska  1   j
alaska  1   k
alaska  1   l
alaska  1   m
alaska  1   n
alaska  1   o
arizona 1   p
arizona 1   q
arizona 1   r
arizona 1   s
arizona 1   t
arizona 1   u
arizona 1   v

into:

state   sector  attribute_value_1   attribute_value_2   attribute_value_3   attribute_value_4   attribute_value_5   attribute_value_6   attribute_value_7   attribute_value_8
alabama 1   a   b   c   d   e   f   g   h
alaska  1   i   j   k   l   m   n   o   n/a
arizona 1   p   q   r   s   t   u   v   n/a

So far, I haven't been able to use dcast or reshape to create this particular transformation.


回答1:


With:

library(data.table)
dcast(setDT(df),
      state + sector ~ rowid(state, prefix = 'attr_val_'),
      value.var = 'attribute_value')

you get:

     state sector attr_val_1 attr_val_2 attr_val_3 attr_val_4 attr_val_5 attr_val_6 attr_val_7 attr_val_8
1: alabama      1          a          b          c          d          e          f          g          h
2:  alaska      1          i          j          k          l          m          n          o         NA
3: arizona      1          p          q          r          s          t          u          v         NA



回答2:


Try this , using dplyr and splitstackshape

library(dplyr)
library(splitstackshape)
df=df%>%group_by(state,sector)%>%dplyr::summarise(attribute_value=paste(attribute_value,sep=',',collapse = ","))
concat.split(df, 3, drop = TRUE)

     state sector attribute_value_1 attribute_value_2 attribute_value_3 attribute_value_4 attribute_value_5 attribute_value_6 attribute_value_7 attribute_value_8
1: alabama      1                 a                 b                 c                 d                 e                 f                 g                 h
2:  alaska      1                 i                 j                 k                 l                 m                 n                 o                NA
3: arizona      1                 p                 q                 r                 s                 t                 u                 v                NA



回答3:


With dplyr and tidyr. The trick is to set up a dummy variable (here called ind) and use this to convert to wide format.

df2 <- df %>% group_by(state, sector) %>% 
              mutate(ind=paste0("Attribute_", seq_along(attribute_value))) %>% 
              ungroup() %>% 
              spread(key=ind, value=attribute_value)

df2
# A tibble: 3 x 10
    state sector Attribute_1 Attribute_2 Attribute_3 Attribute_4 Attribute_5 Attribute_6 Attribute_7 Attribute_8
*   <chr>  <int>       <chr>       <chr>       <chr>       <chr>       <chr>       <chr>       <chr>       <chr>
1 alabama      1           a           b           c           d           e           f           g           h
2  alaska      1           i           j           k           l           m           n           o        <NA>
3 arizona      1           p           q           r           s           t           u           v        <NA>


来源:https://stackoverflow.com/questions/44958261/want-to-cast-unique-values-into-first-second-third-variables

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!