Select nth element in data frame by factor

前端 未结 3 1772
迷失自我 2021-01-21 22:29

I\'ve got a dataframe with a text column name and factor city. It is ordered alphabetically firstly by city and then name. No

  • 2021-01-21 22:55

    A data.table solution

    DT <- data.table(test)
    # return all columns from the subset data.table
    n <- 4
    DT[,.SD[n,] ,by = city]
    ##      city name
    ## 1: Atlanta   NA
    ## 2:  Boston Matt
    ## 3: Seattle   NA
    # if you just want the nth element of `name` 
    # (excluding other columns that might be there)
    # any of the following would work
    DT[,.SD[n,] ,by = city, .SDcols = 'name']
    DT[, .SD[n, list(name)], by = city]
    DT[, list(name = name[n]), by = city]
    0 讨论(0)
  • 2021-01-21 22:56

    In base R using by:

    Set up some test data, including an additional out of range value:

    test <- read.table(text="name    city
    John    Atlanta
    Josh    Atlanta
    Matt    Atlanta
    Bob     Boston
    Kate    Boston
    Lily    Boston
    Matt    Boston
    Bob     Seattle
    Kate    Seattle",header=TRUE)

    Get the 3rd item in each city:,by(test,test$city,function(x) x[3,]))


            name    city
    Atlanta Matt Atlanta
    Boston  Lily  Boston
    Seattle <NA>    <NA>

    To get exactly what you want, here is a little function:

    nthrow <- function(dset,splitvar,n) {
        result <-,by(dset,dset[splitvar],function(x) x[n,]))
        result[,splitvar][[,splitvar])] <- row.names(result)[[,splitvar])]
        row.names(result) <- NULL

    Call it like:



      name    city
    1 Matt Atlanta
    2 Lily  Boston
    3 <NA> Seattle
    0 讨论(0)
  • 2021-01-21 23:06

    You can use plyr for this:

    dat <- structure(list(name = c("John", "Josh", "Matt", "Bob", "Kate", 

    "Lily", "Matt"), city = c("Atlanta", "Atlanta", "Atlanta", "Boston", "Boston", "Boston", "Boston")), .Names = c("name", "city"), class = "data.frame", row.names = c(NA, -7L))

    ddply(dat, .(city), function(x, n) x[n,], n=3)
    > ddply(dat, .(city), function(x, n) x[n,], n=3)
      name    city
    1 Matt Atlanta
    2 Lily  Boston
    > ddply(dat, .(city), function(x, n) x[n,], n=4)
      name   city
    1 <NA>   <NA>
    2 Matt Boston

    There are plenty of other options too using base R or data.table or sqldf...

    0 讨论(0)