问题
I've looked around and can't seem to find a decent way to solve this issue.
I have a column that has rows of names. I'd like to sort each row alphabetically so that I can later identify rows that have the same names just in different orders.
The data looks like this:
names <- c("John D., Josh C., Karl H.",
"John D., Bob S., Tim H.",
"Amy A., Art U., Wes T.",
"Josh C., John D., Karl H.")
var1 <- rnorm(n = length(names), mean = 0, sd = 2)
var2 <- rnorm(n = length(names), mean = 20, sd = 5)
df <- data.frame(names, var1, var2)
df
names var1 var2
1 John D., Josh C., Karl H. -0.3570142 15.58512
2 John D., Bob S., Tim H. -3.0022367 12.32608
3 Amy A., Art U., Wes T. -0.6900956 18.01553
4 Josh C., John D., Karl H. -2.0162847 16.04281
For example, row 4 would get sorted to look like row 1. Row 2 would get sorted as Bob, John, and Tim.
I've tried sort(df$names)
but that just orders the names in all rows into alphabetical order.
回答1:
With dplyr
, you can try:
df %>%
rowwise() %>%
mutate(names = paste(sort(unlist(strsplit(names, ", ", fixed = TRUE))), collapse = ", "))
names var1 var2
<chr> <dbl> <dbl>
1 John D., Josh C., Karl H. -0.226 19.9
2 Bob S., John D., Tim H. 0.424 24.8
3 Amy A., Art U., Wes T. 1.42 25.0
4 John D., Josh C., Karl H. 5.42 20.4
Sample data:
df <- data.frame(names, var1, var2,
stringsAsFactors = FALSE)
回答2:
In base R you could do this:
# Converting factor to character
df$names <- as.character(df$names)
# Splitting string on comma+space(s), sorting them in list,
# and pasting them back together with a comma and a space
df$names <- sapply(lapply(strsplit(df$names, split = ",\\s*"), sort), paste, collapse = ", ")
df
names var1 var2
1 John D., Josh C., Karl H. -2.285181 15.82278
2 Bob S., John D., Tim H. 2.797259 21.42946
3 Amy A., Art U., Wes T. 1.001353 17.30004
4 John D., Josh C., Karl H. 4.034996 24.86374
回答3:
Define a function Sort
which scans in names splitting them into individual fields, sorts them and puts them back together. Then sapply
it to the names
. No packages are used.
Sort <- function(x) {
s <- scan(text = as.character(x), what = "", sep = ",",
strip.white = TRUE, quiet = TRUE)
toString(sort(s))
}
transform(df, names = sapply(names, Sort))
giving:
names var1 var2
1 John D., Josh C., Karl H. -0.324619 28.02955
2 Bob S., John D., Tim H. 1.126112 14.21096
3 Amy A., Art U., Wes T. 3.295635 23.28294
4 John D., Josh C., Karl H. -1.546707 32.74496
来源:https://stackoverflow.com/questions/57258712/sort-each-row-of-character-strings-alphabetically-in-r