I have a bunch of strings that contain lists of names in last name, first name format, separated by commas, like so:
names <- c(\'Beaufoy
If you can be certain that a comma isn't going to be in a person's name, this might work:
mynames <- c('Beaufoy, Simon, Boyle, Danny',
'Nolan, Christopher',
'Blumberg, Stuart, Cholodenko, Lisa',
'Seidler, David',
'Sorkin, Aaron',
'Hoover, J. Edgar')
mynames2 <- strsplit(mynames, ", ")
unlist(lapply(mynames2,
function(x) paste(x[1:length(x) %% 2 == 0],
x[1:length(x) %% 2 != 0])))
# [1] "Simon Beaufoy" "Danny Boyle" "Christopher Nolan"
# [4] "Stuart Blumberg" "Lisa Cholodenko" "David Seidler"
# [7] "Aaron Sorkin" "J. Edgar Hoover"
I've added J. Edgar Hoover in there for good measure.
If you want the names that were quoted together to stay together, add collapse = ", "
to your paste()
function:
unlist(lapply(mynames2,
function(x) paste(x[1:length(x) %% 2 == 0],
x[1:length(x) %% 2 != 0],
collapse = ", ")))
# [1] "Simon Beaufoy, Danny Boyle" "Christopher Nolan"
# [3] "Stuart Blumberg, Lisa Cholodenko" "David Seidler"
# [5] "Aaron Sorkin" "J. Edgar Hoover"
I'm in favor of @AnandaMahto's Answer, but just for fun, this illustrates another method using scan
, split
, and rapply
.
names <- c(names, 'Chambers, John, Ihaka, Ross, Gentleman, Robert')
# extract names
snames <-
lapply(names, function(x) scan(text=x, what='', sep=',', strip.white=TRUE, quiet=TRUE))
# break up names
snames<-lapply(snames, function(x) split(x, rep(seq(length(x) %/% 2), each=2)))
# collapse together, reversed
rapply(snames, function(x) paste(x[2:1], collapse=' '))
(1) Maintain same names in each element This can be done with a single gsub
(assuming there are no commas within names):
> gsub("([^, ][^,]*), ([^,]+)", "\\2 \\1", names)
[1] "Simon Beaufoy, Danny Boyle" "Christopher Nolan"
[3] "Stuart Blumberg, Lisa Cholodenko" "David Seidler"
[5] "Aaron Sorkin"
> gsub("([^, ][^,]*), ([^,]+)", "\\2 \\1", "Hoover, J. Edgar")
[1] "J. Edgar Hoover"
(2) Separate into one name per element If you wanted each first name last name in a separate element then use (a) scan
scan(text = out, sep = ",", what = "")
where out
is the result of the gsub
above or to get it directly try (b) strapply:
> library(gsubfn)
> strapply(names, "([^, ][^,]*), ([^,]+)", x + y ~ paste(y, x), simplify = c)
[1] "Simon Beaufoy" "Danny Boyle" "Christopher Nolan"
[4] "Stuart Blumberg" "Lisa Cholodenko" "David Seidler"
[7] "Aaron Sorkin"
> strapply("Hoover, Edgar J.", "([^, ][^,]*), ([^,]+)", x + y ~ paste(y, x),
+ simplify = c)
[1] "Edgar J. Hoover"
Note that all examples above used the same regular expression for matching.
UPDATE: removed comma separating first and last name.
UPDATE: added code to separate out each first name last name into a separate element in case that is the preferred output format.