I have a R dataset with key value strings which looks like below:
quest<-data.frame(city=c(\"Atlanta\",\"New York\",\"Atlanta\",\"Tampa\"), key_value=c(\"rev=
We can use tidyverse
. With separate_rows
, split the 'key_value' by ;
and expand the rows, then separate
the column into two columns ('key', 'value' at =
, expand the rows at |
(separate_rows
), grouped by 'city', 'key', get the sequence number (row_number()
) and spread
to 'wide' format
library(tidyverse)
separate_rows(quest, key_value, sep=";") %>%
separate(key_value, into = c("key", "value"), sep="=") %>%
separate_rows(value, sep="[|]", convert = TRUE) %>%
group_by(city, key) %>%
mutate(rn = row_number()) %>%
spread(key, value) %>%
select(-rn)
# A tibble: 7 x 4
# Groups: city [3]
# city qty rev zip
#* <fctr> <dbl> <dbl> <dbl>
#1 Atlanta 1 63.0 45987
#2 Atlanta 1 12.0 74268
#3 New York 1 10.6 12686
#4 New York 2 34.0 12694
#5 Tampa 1 3.0 33684
#6 Tampa 6 24.0 36842
#7 Tampa 3 8.0 30254
Split by ;
, then by =
and |
, and combine into a matrix, using the first part as the name. Then repeat the rows of the original data frame by however many rows were found for each, and combine. I don't convert here any columns to numeric, they're left as character.
a <- strsplit(as.character(quest$key_value), ";")
a <- lapply(a, function(x) {
x <- do.call(cbind, strsplit(x, "[=|]"))
colnames(x) <- x[1,]
x[-1,,drop=FALSE]
})
b <- quest[rep(seq_along(a), sapply(a, nrow)), colnames(quest) != "key_value", drop=FALSE]
out <- cbind(b, do.call(rbind, a), stringsAsFactors=FALSE)
rownames(out) <- NULL
out
## city rev qty zip
## 1 Atlanta 63 1 45987
## 2 New York 10.60 1 12686
## 3 New York 34 2 12694
## 4 Atlanta 12 1 74268
## 5 Tampa 3 1 33684
## 6 Tampa 24 6 36842
## 7 Tampa 8 3 30254