Select nth element in data frame by factor

前端 未结 3 1764
迷失自我
迷失自我 2021-01-21 22:29

I\'ve got a dataframe with a text column name and factor city. It is ordered alphabetically firstly by city and then name. No

相关标签:
3条回答
  • 2021-01-21 22:55

    A data.table solution

    library(data.table)
    DT <- data.table(test)
    
    # return all columns from the subset data.table
    n <- 4
    DT[,.SD[n,] ,by = city]
    ##      city name
    ## 1: Atlanta   NA
    ## 2:  Boston Matt
    ## 3: Seattle   NA
    
    # if you just want the nth element of `name` 
    # (excluding other columns that might be there)
    # any of the following would work
    
    DT[,.SD[n,] ,by = city, .SDcols = 'name']
    
    
    DT[, .SD[n, list(name)], by = city]
    
    
    DT[, list(name = name[n]), by = city]
    
    0 讨论(0)
  • 2021-01-21 22:56

    In base R using by:

    Set up some test data, including an additional out of range value:

    test <- read.table(text="name    city
    John    Atlanta
    Josh    Atlanta
    Matt    Atlanta
    Bob     Boston
    Kate    Boston
    Lily    Boston
    Matt    Boston
    Bob     Seattle
    Kate    Seattle",header=TRUE)
    

    Get the 3rd item in each city:

    do.call(rbind,by(test,test$city,function(x) x[3,]))
    

    Result:

            name    city
    Atlanta Matt Atlanta
    Boston  Lily  Boston
    Seattle <NA>    <NA>
    

    To get exactly what you want, here is a little function:

    nthrow <- function(dset,splitvar,n) {
        result <- do.call(rbind,by(dset,dset[splitvar],function(x) x[n,]))
        result[,splitvar][is.na(result[,splitvar])] <- row.names(result)[is.na(result[,splitvar])]
        row.names(result) <- NULL
        return(result)
    }
    

    Call it like:

    nthrow(test,"city",3)
    

    Result:

      name    city
    1 Matt Atlanta
    2 Lily  Boston
    3 <NA> Seattle
    
    0 讨论(0)
  • 2021-01-21 23:06

    You can use plyr for this:

    dat <- structure(list(name = c("John", "Josh", "Matt", "Bob", "Kate", 
    

    "Lily", "Matt"), city = c("Atlanta", "Atlanta", "Atlanta", "Boston", "Boston", "Boston", "Boston")), .Names = c("name", "city"), class = "data.frame", row.names = c(NA, -7L))

    library(plyr)
    
    ddply(dat, .(city), function(x, n) x[n,], n=3)
    
    > ddply(dat, .(city), function(x, n) x[n,], n=3)
      name    city
    1 Matt Atlanta
    2 Lily  Boston
    > ddply(dat, .(city), function(x, n) x[n,], n=4)
      name   city
    1 <NA>   <NA>
    2 Matt Boston
    > 
    

    There are plenty of other options too using base R or data.table or sqldf...

    0 讨论(0)
提交回复
热议问题