R: mixedsort on multiple vectors (columns)

问题

This is a follow-up on this question, which was marked as a duplicate to this, but the suggested solution does not work.

I have the following data.frame:

set.seed(1)
mydf <- data.frame(A=paste(sample(LETTERS, 4), sample(1:20, 20), sep=""),
        B=paste(sample(1:20, 20), sample(LETTERS, 4), sep=""),
        C=sample(LETTERS, 20), D=sample(1:100, 20), value=rnorm(20))

> mydf
     A   B C  D       value
1   G5  6N T  9 -0.68875569
2  J18  8T R 87 -0.70749516
3  N19  1A L 34  0.36458196
4  U12  7K Z 82  0.76853292
5  G11 14N J 98 -0.11234621
6   J1 20T F 32  0.88110773
7   N3 17A B 45  0.39810588
8  U14 19K W 83 -0.61202639
9   G9 15N U 80  0.34111969
10 J20  3T I 36 -1.12936310
11  N8  9A K 70  1.43302370
12 U16 16K G 86  1.98039990
13  G6 10N M 39 -0.36722148
14  J7 18T D 62 -1.04413463
15 N13  5A Y 35  0.56971963
16  U4 11K N 28 -0.13505460
17 G17  4N O 64  2.40161776
18 J15  2T C 17 -0.03924000
19  N2 12A P 59  0.68973936
20 U10 13K X 10  0.02800216

I want to order it according to columns A to D, but A and D are mixed, so natural order is required.

I know I can apply regular ordering, like:

mydf2 <- mydf[do.call(order, c(mydf[1:4], list(decreasing = FALSE))),]

> mydf2
     A   B C  D       value
5  G11 14N J 98 -0.11234621
17 G17  4N O 64  2.40161776
1   G5  6N T  9 -0.68875569
13  G6 10N M 39 -0.36722148
9   G9 15N U 80  0.34111969
6   J1 20T F 32  0.88110773
18 J15  2T C 17 -0.03924000
2  J18  8T R 87 -0.70749516
10 J20  3T I 36 -1.12936310
14  J7 18T D 62 -1.04413463
15 N13  5A Y 35  0.56971963
3  N19  1A L 34  0.36458196
19  N2 12A P 59  0.68973936
7   N3 17A B 45  0.39810588
11  N8  9A K 70  1.43302370
20 U10 13K X 10  0.02800216
4  U12  7K Z 82  0.76853292
8  U14 19K W 83 -0.61202639
12 U16 16K G 86  1.98039990
16  U4 11K N 28 -0.13505460

But this is not the result I need. I need 10 after 9, not after 1 (you can check column A to see it is not in the order I need.)

In the comments of my original question, it was suggested to use the multi.mixedorder function.

However, as you can see below, the result is identical to the one using just order, which is still not what I want.

multi.mixedorder <- function(..., na.last = TRUE, decreasing = FALSE){
    do.call(order, c(
        lapply(list(...), function(l){
            if(is.character(l)){
                factor(l, levels=mixedsort(unique(l)))
            } else {
                l
            }
        }),
        list(na.last = na.last, decreasing = decreasing)
    ))
}

mydf3 <- mydf[do.call(multi.mixedorder, c(mydf[1:4], list(decreasing = FALSE))),]

> mydf3
    A   B C  D       value
5  G11 14N J 98 -0.11234621
17 G17  4N O 64  2.40161776
1   G5  6N T  9 -0.68875569
13  G6 10N M 39 -0.36722148
9   G9 15N U 80  0.34111969
6   J1 20T F 32  0.88110773
18 J15  2T C 17 -0.03924000
2  J18  8T R 87 -0.70749516
10 J20  3T I 36 -1.12936310
14  J7 18T D 62 -1.04413463
15 N13  5A Y 35  0.56971963
3  N19  1A L 34  0.36458196
19  N2 12A P 59  0.68973936
7   N3 17A B 45  0.39810588
11  N8  9A K 70  1.43302370
20 U10 13K X 10  0.02800216
4  U12  7K Z 82  0.76853292
8  U14 19K W 83 -0.61202639
12 U16 16K G 86  1.98039990
16  U4 11K N 28 -0.13505460

回答1:

OK solved it, the multi.mixedsort function needs a fix to be able to deal with factors:

multi.mixedorder <- function(..., na.last = TRUE, decreasing = FALSE){
    do.call(order, c(
        lapply(list(...), function(l){
            if(is.character(l)){
                factor(l, levels=mixedsort(unique(l)))
            } else {
                factor(as.character(l), levels=mixedsort(levels(l)))
            }
        }),
        list(na.last = na.last, decreasing = decreasing)
    ))
}

Otherwise convert all factor columns in mydf into character, with:

mydf[] <- lapply(mydf, as.character)

but with the fix, this shouldn't be needed

来源：https://stackoverflow.com/questions/54089471/r-mixedsort-on-multiple-vectors-columns

标签

sorting

dataframe

natural-sort