I\'ve got a data.frame
with key/value string
column containing information about features and their values for a set of users. Something like this:
You can use dplyr
and tidyr
:
library(dplyr); library(tidyr)
data %>% mutate(str = strsplit(str, ",")) %>% unnest(str) %>%
separate(str, into = c('var', 'val'), sep = ":") %>% spread(var, val, fill = 0)
# id statid 7 a b c
# 1 1 s003e 2 1 0 0
# 2 2 s093u 0 1 0 4
# 3 3 s085t 0 3 5 33
We can use cSplit
to do this in a cleaner way. Convert the data to 'long' format by splitting at ,
, then do the split at :
and dcast
from 'long' to 'wide'
library(splitstackshape)
library(data.table)
dcast(cSplit(cSplit(data, "str", ",", "long"), "str", ":"),
id+statid~str_1, value.var="str_2", fill = 0)
# id statid 7 a b c
#1: 1 s003e 2 1 0 0
#2: 2 s093u 0 1 0 4
#3: 3 s085t 0 3 5 33