Separate contents of field

问题

I'm sure this is very simple, and I think it's a case of using separate and gather.

I have a single field in a dataframe, authorlist,an edited export of a pubmed search. It contains the authors of the publications. It can, obviously, contain either a single author or a collaboration of authors.

For example this is just a selection of the options available:

Author
Drijgers RL, Verhey FR, Leentjens AF, Kahler S, Aalten P.

What I'd like to do is create a single list of ALL authors so that I'd have something like

Author
Drijgers RL
Verhey FR
Leentjens AF
Kahler S
Aalten P

How do I do that? I thought it would be something like

authSpread<-authorlist%>%separate(Author,sep =",",extra ="drop")

But it's not working. If I put into = "NA" I get just the first authors listed in a single column. What I'd like to do is replicate the text to columns function in excel, where you can specify the character to split at and the contents of the cell are cast/spread to new cells. And then regather them into one column. I don't know the maximum number of authors, and therefore don't know the number of columns to split by (or how to label them) programatically.

Edit: clarification I don't know if I want to make a long dataframe wide AND then gather - because I don't know how many fields would be generated. Is this a sensible thing? I would think I could write the output of the separate by "," to a list and then write the contents of that list as single data frame. Does that sound more efficient?

回答1:

You're looking for separate_rows.

Input:

df <- data.frame(authors = c("Drijgers RL, Verhey FR, Leentjens AF, KÃ¶hler S, Aalten P."))

                                                     authors
1 Drijgers RL, Verhey FR, Leentjens AF, KÃ¶hler S, Aalten P.

Function:

library(tidyverse)

df %>% separate_rows(authors, sep = ", ")

Output:

       authors
1  Drijgers RL
2    Verhey FR
3 Leentjens AF
4    KÃ¶hler S
5    Aalten P.

You can save them in a list like that:

authors_list <- df %>% separate_rows(authors, sep = ", ") %>% pull(authors)

Output:

[1] "Drijgers RL"  "Verhey FR"    "Leentjens AF" "KÃ¶hler S"    "Aalten P."

If you have authors of multiple articles in your list and you want only unique occurences, just add unique() at the end:

authors_list <- df %>% separate_rows(authors, sep = ", ") %>% pull(authors) %>% unique()

来源：https://stackoverflow.com/questions/53309849/separate-contents-of-field

标签

lapply

tidyverse

tidyr

sapply