Is there an equivalent R function to Stata 'order' command?

后端 未结 6 1187
旧时难觅i
旧时难觅i 2021-01-13 11:09

\'order\' in R seems like \'sort\' in Stata. Here\'s a dataset for example (only variable names listed):

v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15 v16 v17 v

相关标签:
6条回答
  • 2021-01-13 11:45

    This should give you the same file:

    #snip
    gtinfo <- rbind(tweetinfo, noretweetinfo)
    gtinfo$deleted=""
    retweetinfo <- transform(retweetinfo, reTweetId="", reUserId="")
    gtinfo <- rbind(gtinfo, retweetinfo)
    gtinfo <-gtinfo[,c(1:16,18,17)]
    #snip
    

    It is possible to implement a function like Strata's order function in R, but I don't think there is much demand for that.

    0 讨论(0)
  • 2021-01-13 11:49

    The package dplyr and the function dplyr::relocate, a new verb introduced in dplyr 1.0.0, does exactly what you are looking for.

    library(dplyr)

    data %>% relocate(v17, v18, .before = v13)

    data %>% relocate(v6, v16, .after = last_col())

    data %>% relocate(age, .after = gender)

    0 讨论(0)
  • 2021-01-13 11:50

    Because I'm procrastinating and experimenting with different things, here's a function that I whipped up. Ultimately, it depends on append:

    moveme <- function(invec, movecommand) {
      movecommand <- lapply(strsplit(strsplit(movecommand, ";")[[1]], ",|\\s+"), 
                            function(x) x[x != ""])
      movelist <- lapply(movecommand, function(x) {
        Where <- x[which(x %in% c("before", "after", "first", "last")):length(x)]
        ToMove <- setdiff(x, Where)
        list(ToMove, Where)
      })
      myVec <- invec
      for (i in seq_along(movelist)) {
        temp <- setdiff(myVec, movelist[[i]][[1]])
        A <- movelist[[i]][[2]][1]
        if (A %in% c("before", "after")) {
          ba <- movelist[[i]][[2]][2]
          if (A == "before") {
            after <- match(ba, temp)-1
          } else if (A == "after") {
            after <- match(ba, temp)
          }    
        } else if (A == "first") {
          after <- 0
        } else if (A == "last") {
          after <- length(myVec)
        }
        myVec <- append(temp, values = movelist[[i]][[1]], after = after)
      }
      myVec
    }
    

    Here's some sample data representing the names of your dataset:

    x <- paste0("v", 1:18)
    

    Imagine now that we wanted "v17" and "v18" before "v3", "v6" and "v16" at the end, and "v5" at the beginning:

    moveme(x, "v17, v18 before v3; v6, v16 last; v5 first")
    #  [1] "v5"  "v1"  "v2"  "v17" "v18" "v3"  "v4"  "v7"  "v8"  "v9"  "v10" "v11" "v12"
    # [14] "v13" "v14" "v15" "v6"  "v16"
    

    So, the obvious usage would be, for a data.frame named "df":

    df[moveme(names(df), "how you want to move the columns")]
    

    And, for a data.table named "DT" (which, as @mnel points out, would be more memory efficient):

    setcolorder(DT, moveme(names(DT), "how you want to move the columns"))
    

    Note that compound moves are specified by semicolons.

    The recognized moves are:

    • before (move the specified columns to before another named column)
    • after (move the specified columns to after another named column)
    • first (move the specified columns to the first position)
    • last (move the specified columns to the last position)
    0 讨论(0)
  • 2021-01-13 11:53

    It is very unclear what you would like to do, but your first sentence makes me assume you would like to sort dataset.

    Actually, there is a built-in order function, which returns the indices of the ordered sequence. Are you searching this?

    > x <- c(3,2,1)
    
    > order(x)
    [1] 3 2 1
    
    > x[order(x)]
    [1] 1 2 3
    
    0 讨论(0)
  • 2021-01-13 11:56

    I get your problem. I now have code to offer:

    move <- function(data,variable,before) {
      m <- data[variable]
      r <- data[names(data)!=variable]
      i <- match(before,names(data))
      pre <- r[1:i-1]
      post <- r[i:length(names(r))]
      cbind(pre,m,post)
    }
    
    # Example.
    library(MASS)
    data(painters)
    str(painters)
    
    # Move 'Expression' variable before 'Drawing' variable.
    new <- move(painters,"Expression","Drawing")
    View(new)
    
    0 讨论(0)
  • 2021-01-13 11:56

    You could write your own function that does this.

    The following will give you the new order for your column names using similar syntax to stata

    • where is a named list with 4 possibilities

      • list(last = T)
      • list(first = T)
      • list(before = x) where x is the variable name in question
      • list(after = x) where x is the variable name in question
    • sorted = T will sort var_list lexicographically (a combination of alphabetic and sequential from the stata command

    The function works on the names only, (once you pass a data.frame object as data, and returns a reordered list of names

    eg

    stata.order <- function(var_list, where, sorted = F, data) {
        all_names = names(data)
        # are all the variable names in
        check <- var_list %in% all_names
        if (any(!check)) {
            stop("Not all variables in var_list exist within  data")
        }
        if (names(where) == "before") {
            if (!(where %in% all_names)) {
                stop("before variable not in the data set")
            }
        }
        if (names(where) == "after") {
            if (!(where %in% all_names)) {
                stop("after variable not in the data set")
            }
        }
    
        if (sorted) {
            var_list <- sort(var_list)
        }
        where_in <- which(all_names %in% var_list)
        full_list <- seq_along(data)
        others <- full_list[-c(where_in)]
    
        .nwhere <- names(where)
        if (!(.nwhere %in% c("last", "first", "before", "after"))) {
            stop("where must be a list of a named element first, last, before or after")
        }
    
        do_what <- switch(names(where), last = length(others), first = 0, before = which(all_names[others] == 
            where) - 1, after = which(all_names[others] == where))
    
        new_order <- append(others, where_in, do_what)
        return(all_names[new_order])
    }
    
    tmp <- as.data.frame(matrix(1:100, ncol = 10))
    
    stata.order(var_list = c("V2", "V5"), where = list(last = T), data = tmp)
    
    ##  [1] "V1"  "V3"  "V4"  "V6"  "V7"  "V8"  "V9"  "V10" "V2"  "V5" 
    
    stata.order(var_list = c("V2", "V5"), where = list(first = T), data = tmp)
    
    ##  [1] "V2"  "V5"  "V1"  "V3"  "V4"  "V6"  "V7"  "V8"  "V9"  "V10"
    
    stata.order(var_list = c("V2", "V5"), where = list(before = "V6"), data = tmp)
    
    ##  [1] "V1"  "V3"  "V4"  "V2"  "V5"  "V6"  "V7"  "V8"  "V9"  "V10"
    
    stata.order(var_list = c("V2", "V5"), where = list(after = "V4"), data = tmp)
    
    ##  [1] "V1"  "V3"  "V4"  "V2"  "V5"  "V6"  "V7"  "V8"  "V9"  "V10"
    
    # throws an error
    stata.order(var_list = c("V2", "V5"), where = list(before = "v11"), data = tmp)
    
    ## Error: before variable not in the data set
    

    if you want to do the reordering memory-efficiently (by reference, without copying) use data.table

    DT <- data.table(tmp)
    # sets by reference, no copying
    setcolorder(DT, stata.order(var_list = c("V2", "V5"), where = list(after = "V4"), 
        data = DT))
    
    DT
    
    ##     V1 V3 V4 V2 V5 V6 V7 V8 V9 V10
    ##  1:  1 21 31 11 41 51 61 71 81  91
    ##  2:  2 22 32 12 42 52 62 72 82  92
    ##  3:  3 23 33 13 43 53 63 73 83  93
    ##  4:  4 24 34 14 44 54 64 74 84  94
    ##  5:  5 25 35 15 45 55 65 75 85  95
    ##  6:  6 26 36 16 46 56 66 76 86  96
    ##  7:  7 27 37 17 47 57 67 77 87  97
    ##  8:  8 28 38 18 48 58 68 78 88  98
    ##  9:  9 29 39 19 49 59 69 79 89  99
    ## 10: 10 30 40 20 50 60 70 80 90 100
    
    0 讨论(0)
提交回复
热议问题