Reordering columns in data frame once again

问题

I want to re-order my columns in my data frame, but what I found so far is not satisfactory.

My dataframe looks like:

cnt  <-as.factor(c("Country 1", "Country 2", "Country 3", "Country 1", "Country 2", "Country 3" ))
bnk  <-as.factor(c("bank 1", "bank 2", "bank 3", "bank 1", "bank 2", "bank 3" ))
mayData <-data.frame(age=c(10,12,13,10,11,15), Country=cnt, Bank=bnk, q10=c(1,1,1,2,2,2),q11=c(1,1,1,2,2,2), q1=c(1,1,1,2,2,2), q9=c(1,1,1,2,2,2), q6=c(1,1,1,2,2,2), year=c(1950,1960,1970,1980,1990,2000) )

   age    Country     Bank  q10 q11 q1  q9  q6  year
1   10  Country 1   bank 1  1   1   1   1   1   1950
2   12  Country 2   bank 2  1   1   1   1   1   1960
3   13  Country 3   bank 3  1   1   1   1   1   1970
4   10  Country 1   bank 1  2   2   2   2   2   1980
5   11  Country 2   bank 2  2   2   2   2   2   1990
6   15  Country 3   bank 3  2   2   2   2   2   2000

but I want to re-arrange the columns to look like this:

      Country     Bank  year    age q1  q6  q9  q10 q11
1   Country 1   bank 1  1950    10  1   1   1   1   1
2   Country 2   bank 2  1960    12  1   1   1   1   1
3   Country 3   bank 3  1970    13  1   1   1   1   1
4   Country 1   bank 1  1980    10  2   2   2   2   2
5   Country 2   bank 2  1990    11  2   2   2   2   2
6   Country 3   bank 3  2000    15  2   2   2   2   2

My real dataframe has a lot of columns, so rearranging the column orders "manually" using the index or the names of each column is not optimal.

Notice also, that for the column names that begin with qs I want to have them in ascending order, that is from q1 to q11. The problem is that R fails to understand that q6 - which stands for "question 6" - should be precede q10. To see this deficiency, look at the following example:

mayData<-mayData[,order(colnames(mayData),decreasing=F)] 

    age   Bank    Country   q1  q10 q11 q6  q9  year
1   10  bank 1  Country 1   1   1   1   1   1   1950
2   12  bank 2  Country 2   1   1   1   1   1   1960
3   13  bank 3  Country 3   1   1   1   1   1   1970
4   10  bank 1  Country 1   2   2   2   2   2   1980
5   11  bank 2  Country 2   2   2   2   2   2   1990
6   15  bank 3  Country 3   2   2   2   2   2   2000

So, essentially the way I want to reorder my columns is to first sort a few columns in some flexible way according to my preference and then use a decreasing ordering criteria. But, the "logical" one, one that R can understand to sort the qs properly.

回答1:

We can use mixedsort from gtools to arrange the 'q' columns.

library(gtools)
i1 <- grep("q\\d+", names(mayData))
nm1 <- mixedsort(names(mayData)[i1])
mayData[c(setdiff(names(mayData), nm1), nm1)]
#  age   Country   Bank year q1 q6 q9 q10 q11
#1  10 Country 1 bank 1 1950  1  1  1   1   1
#2  12 Country 2 bank 2 1960  1  1  1   1   1
#3  13 Country 3 bank 3 1970  1  1  1   1   1
#4  10 Country 1 bank 1 1980  2  2  2   2   2
#5  11 Country 2 bank 2 1990  2  2  2   2   2
#6  15 Country 3 bank 3 2000  2  2  2   2   2

NOTE: Using only base R functions and a single package.

Or as @Cath mentioned, removing the substring with gsub can be used to order as well

sort(as.numeric(sub("^q", "", names(mayData)[i1])))

回答2:

You can rename the column names with a single digit to add a leading zero:

cn <- names(mayData)
q_digit <- cn[grep("^q[0-9]$", cn)]
names(mayData)[names(mayData) %in% q_digit] <- gsub("q", "q0", q_digit)
mayData[,order(colnames(mayData),decreasing=F)] 

mayData    
age   Bank   Country   q01 q06 q09 q10 q11 year
1  10 bank 1 Country 1   1   1   1   1   1 1950
2  12 bank 2 Country 2   1   1   1   1   1 1960
3  13 bank 3 Country 3   1   1   1   1   1 1970
4  10 bank 1 Country 1   2   2   2   2   2 1980
5  11 bank 2 Country 2   2   2   2   2   2 1990
6  15 bank 3 Country 3   2   2   2   2   2 2000

This supposes you have less than 100 questions, if you have more you can adapt this to add another zeros to the double-digits column names.

回答3:

Leveraging dplyr's select and num_range and tidyr::extract_numeric:

library(dplyr)
library(tidyr)

mayData %>% select(Country, Bank, year, age, 
                   num_range('q', sort(extract_numeric(names(mayData)))))
#     Country   Bank year age q1 q6 q9 q10 q11
# 1 Country 1 bank 1 1950  10  1  1  1   1   1
# 2 Country 2 bank 2 1960  12  1  1  1   1   1
# 3 Country 3 bank 3 1970  13  1  1  1   1   1
# 4 Country 1 bank 1 1980  10  2  2  2   2   2
# 5 Country 2 bank 2 1990  11  2  2  2   2   2
# 6 Country 3 bank 3 2000  15  2  2  2   2   2

来源：https://stackoverflow.com/questions/37296162/reordering-columns-in-data-frame-once-again

标签

data-manipulation

data-cleaning