问题
I want to re-order my columns in my data frame, but what I found so far is not satisfactory.
My dataframe looks like:
cnt <-as.factor(c("Country 1", "Country 2", "Country 3", "Country 1", "Country 2", "Country 3" ))
bnk <-as.factor(c("bank 1", "bank 2", "bank 3", "bank 1", "bank 2", "bank 3" ))
mayData <-data.frame(age=c(10,12,13,10,11,15), Country=cnt, Bank=bnk, q10=c(1,1,1,2,2,2),q11=c(1,1,1,2,2,2), q1=c(1,1,1,2,2,2), q9=c(1,1,1,2,2,2), q6=c(1,1,1,2,2,2), year=c(1950,1960,1970,1980,1990,2000) )
age Country Bank q10 q11 q1 q9 q6 year
1 10 Country 1 bank 1 1 1 1 1 1 1950
2 12 Country 2 bank 2 1 1 1 1 1 1960
3 13 Country 3 bank 3 1 1 1 1 1 1970
4 10 Country 1 bank 1 2 2 2 2 2 1980
5 11 Country 2 bank 2 2 2 2 2 2 1990
6 15 Country 3 bank 3 2 2 2 2 2 2000
but I want to re-arrange the columns to look like this:
Country Bank year age q1 q6 q9 q10 q11
1 Country 1 bank 1 1950 10 1 1 1 1 1
2 Country 2 bank 2 1960 12 1 1 1 1 1
3 Country 3 bank 3 1970 13 1 1 1 1 1
4 Country 1 bank 1 1980 10 2 2 2 2 2
5 Country 2 bank 2 1990 11 2 2 2 2 2
6 Country 3 bank 3 2000 15 2 2 2 2 2
My real dataframe has a lot of columns, so rearranging the column orders "manually" using the index or the names of each column is not optimal.
Notice also, that for the column names that begin with q
s I want to have them in ascending order, that is from q1
to q11
. The problem is that R fails to understand that q6
- which stands for "question 6" - should be precede q10
. To see this deficiency, look at the following example:
mayData<-mayData[,order(colnames(mayData),decreasing=F)]
age Bank Country q1 q10 q11 q6 q9 year
1 10 bank 1 Country 1 1 1 1 1 1 1950
2 12 bank 2 Country 2 1 1 1 1 1 1960
3 13 bank 3 Country 3 1 1 1 1 1 1970
4 10 bank 1 Country 1 2 2 2 2 2 1980
5 11 bank 2 Country 2 2 2 2 2 2 1990
6 15 bank 3 Country 3 2 2 2 2 2 2000
So, essentially the way I want to reorder my columns is to first sort a few columns in some flexible way according to my preference and then use a decreasing ordering criteria. But, the "logical" one, one that R can understand to sort the q
s properly.
回答1:
We can use mixedsort
from gtools
to arrange the 'q' columns.
library(gtools)
i1 <- grep("q\\d+", names(mayData))
nm1 <- mixedsort(names(mayData)[i1])
mayData[c(setdiff(names(mayData), nm1), nm1)]
# age Country Bank year q1 q6 q9 q10 q11
#1 10 Country 1 bank 1 1950 1 1 1 1 1
#2 12 Country 2 bank 2 1960 1 1 1 1 1
#3 13 Country 3 bank 3 1970 1 1 1 1 1
#4 10 Country 1 bank 1 1980 2 2 2 2 2
#5 11 Country 2 bank 2 1990 2 2 2 2 2
#6 15 Country 3 bank 3 2000 2 2 2 2 2
NOTE: Using only base R
functions and a single package.
Or as @Cath mentioned, removing the substring with gsub
can be used to order as well
sort(as.numeric(sub("^q", "", names(mayData)[i1])))
回答2:
You can rename the column names with a single digit to add a leading zero:
cn <- names(mayData)
q_digit <- cn[grep("^q[0-9]$", cn)]
names(mayData)[names(mayData) %in% q_digit] <- gsub("q", "q0", q_digit)
mayData[,order(colnames(mayData),decreasing=F)]
mayData
age Bank Country q01 q06 q09 q10 q11 year
1 10 bank 1 Country 1 1 1 1 1 1 1950
2 12 bank 2 Country 2 1 1 1 1 1 1960
3 13 bank 3 Country 3 1 1 1 1 1 1970
4 10 bank 1 Country 1 2 2 2 2 2 1980
5 11 bank 2 Country 2 2 2 2 2 2 1990
6 15 bank 3 Country 3 2 2 2 2 2 2000
This supposes you have less than 100 questions, if you have more you can adapt this to add another zeros to the double-digits column names.
回答3:
Leveraging dplyr
's select
and num_range
and tidyr::extract_numeric
:
library(dplyr)
library(tidyr)
mayData %>% select(Country, Bank, year, age,
num_range('q', sort(extract_numeric(names(mayData)))))
# Country Bank year age q1 q6 q9 q10 q11
# 1 Country 1 bank 1 1950 10 1 1 1 1 1
# 2 Country 2 bank 2 1960 12 1 1 1 1 1
# 3 Country 3 bank 3 1970 13 1 1 1 1 1
# 4 Country 1 bank 1 1980 10 2 2 2 2 2
# 5 Country 2 bank 2 1990 11 2 2 2 2 2
# 6 Country 3 bank 3 2000 15 2 2 2 2 2
来源:https://stackoverflow.com/questions/37296162/reordering-columns-in-data-frame-once-again