问题
I would like to combine multiple columns that I have in a data frame into one column in that data frame that is a list. For example, I have the following data frame ingredients:
name1 name2 imgID attr1 attr2 attr3...
Item1 ItemID1 Img1 water chocolate soy...
Item2 ItemID2 Img2 cocoa spice milk...
I would like to combine the attr columns into one column that is a comma-separated list of those items and if possible have them appear in the following format:
name1 name2 imgID attrs
Item1 ItemID1 Img1 c("water", "chocolate", "soy", ...)
Item2 ItemID2 Img2 c("cocoa", "spice", "milk", ...)
Is there a succinct way to write the code using a paste or join that allows me to call the columns of the data frame as ingredients[4:50]
rather than each one by name? Is there also a way to not include NA
or NULL
values in that list?
回答1:
You could use tidyr::nest
, though you'll probably want to simplify the nested data frames to character vectors afterwards, e.g.
library(tidyverse)
items <- tibble(name1 = c("Item1", "Item2"),
name2 = c("ItemID1", "ItemID2"),
imgID = c("Img1", "Img2"),
attr1 = c("water", "cocoa"),
attr2 = c("chocolate", "spice"),
attr3 = c("soy", "milk"))
items_nested <- items %>%
nest(contains('attr'), .key = 'attr') %>%
mutate(attr = map(attr, simplify))
items_nested
#> # A tibble: 2 x 4
#> name1 name2 imgID attr
#> <chr> <chr> <chr> <list>
#> 1 Item1 ItemID1 Img1 <chr [3]>
#> 2 Item2 ItemID2 Img2 <chr [3]>
Other options include reshaping to long with tidyr::gather
, grouping by all but the new columns, and aggregating the value column into a list in a more dplyr-focused style:
items %>%
gather(attr_num, attr, contains('attr')) %>%
group_by_at(vars(-attr_num, -attr)) %>%
summarise(attr = list(attr)) %>%
ungroup()
or unite
ing the attr*
columns and then separating them within a list column with strsplit
in a more string-focused style:
items %>%
unite(attr, contains('attr')) %>%
mutate(attr = strsplit(attr, '_'))
or using purrr::transpose
and tidyselect in a list-focused style:
items %>%
mutate(attr = transpose(select(., contains('attr')))) %>%
select(-matches('attr.'))
All options return the same thing, at least on the sample data. Further cleanup, e.g. dropping NA
s, can be done by iterating over the new column with lapply
/purrr::map
.
来源:https://stackoverflow.com/questions/48837024/how-to-combine-multiple-columns-of-an-r-data-frame-into-a-single-column-that-is